Home Q Learning
Post
Cancel

Q Learning

What is Q Learning?

Q-Learning is almost the same as the MDP. One key difference from MDP is that Q-Learning applies Temporal Difference to make changes happen gradually. Why do the changes happen gradually? It’s because there are situations where less probable events occur. Using these Q-values, we can develop a deep learning algorithm called Deep Q-Learning.



Formula of Q Learning

$ Q_{t}(s, a)\ =\ Q_{t-1}(s,a)\ +\ \alpha(R(s,a)\ +\ \gamma(maxQ(s',a')\ -\ Q_{t-1}(s,a))) $





What is deep Q Learning?

Deep Q-Learning is a deep learning algorithm that uses Q-values. Deep Q-learning learns as:



Q-Targets are the values calculated through Q Learning. Deep Q-Learning aims to minimize $L$ by adjusting weights.

After learning is finished, Deep Q-Learning chooses an action through the softmax function.



This post is licensed under CC BY 4.0 by the author.