What is Actor-Critic?
Actor-Critic is a component of TD3. As you can guess from the name, Actor-Critic is composed of two networks:
Actor network
The Actor is the same as the policy gradient. The Actor decides on actions when states are given. The shape of the Actor is:

Critic network
The Critic evaluates the quality of states. The shape of the Critic is:

Both of networks update the policy parameters through gradient ascent like policy gradient:
$ \nabla_{\theta}J(\theta)\ =\ \nabla_{\theta}\sum_{s\in S}d^{\pi}(s)V^{\pi}(s)\ =\ \nabla_{\theta}\sum_{s\in S}d^{\pi}(s)\sum_{a \in A} \pi_{\theta}(a|s)Q^{\pi}(s,a) $