Markov Decision Process(MDP)

Posted Apr 23, 2023 Updated Jun 17, 2024

1 min read

What is MDP?

Since few things are 100% certain in the real world, a more advanced formula than the Bellman Equation is needed.
For this reason, MDP is devised.

Assume there is a case where you need to choose a direction.

According to the Bellman Equation, the robot moves up 100%. This is because in the Bellman Equation there is no probability of going left or right.

In MDP, however, the robot moves in various directions because there is a probability of going right or left.

The formula of MDP

$ V(s)\ =\ max(R(s,a)\ +\ \gamma \sum_{s'}P(s,a,s')V(s')) $

s : Current state or given state.
s’ : The following state. = The state that ends after this state.
a : Action.
R(s, a) : Reward.
P(s, a, s’) : Probability of s’ arising from s and a.

Deep Reinforcement Learning, DRL_Theory

deep reinforcement learning python

This post is licensed under CC BY 4.0 by the author.

Markov Decision Process(MDP)

What is MDP?

The formula of MDP

Further Reading

Living Penalty

Q Learning

Bellman Equation