Home Markov Decision Process(MDP)
Post
Cancel

Markov Decision Process(MDP)

What is MDP?

Since few things are 100% certain in the real world, a more advanced formula than the Bellman Equation is needed.
For this reason, MDP is devised.

Assume there is a case where you need to choose a direction.



According to the Bellman Equation, the robot moves up 100%. This is because in the Bellman Equation there is no probability of going left or right.



In MDP, however, the robot moves in various directions because there is a probability of going right or left.




The formula of MDP

$ V(s)\ =\ max(R(s,a)\ +\ \gamma \sum_{s'}P(s,a,s')V(s')) $
  • s : Current state or given state.
  • s’ : The following state. = The state that ends after this state.
  • a : Action.
  • R(s, a) : Reward.
  • P(s, a, s’) : Probability of s’ arising from s and a.
This post is licensed under CC BY 4.0 by the author.