Soft Actor Critic (SAC)

Related to Actor Critic Methods.

SAC is an off-policy actor-critic algorithm but adds an entropy term to the reward, encouraging the policy to explore more by remaining stochastic during training.

It uses two critic networks like TD3 to reduce overestimation in bias and improve stability.

π^{*} = ar g π max E_{τ \sim π} t = 0 \sum \infty γ^{t} R (s_{t}, a_{t}, s_{t + 1}) + Entropy α H (π (\cdot ∣ s_{t}))

SAC qualities:

stable training by reducing overestimation
exploration: entropy regularization prevents early convergence.
sample efficiency: off-policy learning (replay buffer) improves data usage.
automatic tuning: learns $α$ to balance exploration and exploitation.
continuous actions: naturally handles high-dimensional action spaces.

🚀 Costin Chitic

Recent Notes

Actor-Critic Methods

Deep Q-Learning

Monte Carlo Learning

Proximal Policy Optimization (PPO)

Q-Learning

Soft Actor Critic (SAC)

Graph View

Backlinks