What is Q Learning?

Q-Learning is almost the same as the MDP. One key difference from MDP is that Q-Learning applies Temporal Difference to make changes happen gradually. Why do the changes happen gradually? It’s because there are situations where less probable events occur. Using these Q-values, we can develop a deep learning algorithm called Deep Q-Learning.

Formula of Q Learning

$ Q_{t}(s, a)\ =\ Q_{t-1}(s,a)\ +\ \alpha(R(s,a)\ +\ \gamma(maxQ(s',a')\ -\ Q_{t-1}(s,a))) $

What is deep Q Learning?

Deep Q-Learning is a deep learning algorithm that uses Q-values. Deep Q-learning learns as:

Q-Targets are the values calculated through Q Learning. Deep Q-Learning aims to minimize $L$ by adjusting weights.

After learning is finished, Deep Q-Learning chooses an action through the softmax function.

Q Learning

What is Q Learning?

Formula of Q Learning

What is deep Q Learning?

Further Reading

Living Penalty

Bellman Equation

Markov Decision Process(MDP)