What is Actor-Critic?

Actor-Critic is a component of TD3. As you can guess from the name, Actor-Critic is composed of two networks:

Actor network

The Actor is the same as the policy gradient. The Actor decides on actions when states are given. The shape of the Actor is:

Critic network

The Critic evaluates the quality of states. The shape of the Critic is:

Both of networks update the policy parameters through gradient ascent like policy gradient:

$ \theta_{t+1}\ =\ \theta_{t}\ +\ \alpha \nabla_{\theta}J(\pi_{\theta})|_{\theta_{t}} $
$ \nabla_{\theta}J(\theta)\ =\ \nabla_{\theta}\sum_{s\in S}d^{\pi}(s)V^{\pi}(s)\ =\ \nabla_{\theta}\sum_{s\in S}d^{\pi}(s)\sum_{a \in A} \pi_{\theta}(a|s)Q^{\pi}(s,a) $
This post is licensed under CC BY 4.0 by the author.