What is Living Penalty?
A living penalty is applied to avoid infinite loops in Reinforcement Learning. Let’s assume that we are trying to find a way out of a maze:
At first, we set the living penalty to 0 ($R(s) = 0$). Then, the $V(s)$ will be:
If we set the living penalty to -0.04($R(s) = -0.04$).
One thing to note is that the living penalty is applied when the agent leaves the box. Then, the V(s) will be:
If we set the living penalty to -0.5 ($R(s) = -0.5$), then the $V(s)$ will be:
This is more suitable than $R(s) = -0.04$ or $R(s) = 0$. But if we set $R(s) = -2.0$, then the $V(s)$ will be:
This is because living for a long time incurs a bigger penalty than going through the red box.