What is MDP?
Since few things are 100% certain in the real world, a more advanced formula than the Bellman Equation is needed.
For this reason, MDP is devised.
Assume there is a case where you need to choose a direction.
According to the Bellman Equation, the robot moves up 100%. This is because in the Bellman Equation there is no probability of going left or right.
In MDP, however, the robot moves in various directions because there is a probability of going right or left.
The formula of MDP
- s : Current state or given state.
- s’ : The following state. = The state that ends after this state.
- a : Action.
- R(s, a) : Reward.
- P(s, a, s’) : Probability of s’ arising from s and a.