What is Q Learning?
Q-Learning is almost the same as the MDP. One key difference from MDP is that Q-Learning applies Temporal Difference to make changes happen gradually. Why do the changes happen gradually? It’s because there are situations where less probable events occur. Using these Q-values, we can develop a deep learning algorithm called Deep Q-Learning.
Formula of Q Learning
What is deep Q Learning?
Deep Q-Learning is a deep learning algorithm that uses Q-values. Deep Q-learning learns as:
Q-Targets are the values calculated through Q Learning. Deep Q-Learning aims to minimize $L$ by adjusting weights.
After learning is finished, Deep Q-Learning chooses an action through the softmax function.