PPO
Proximal Policy Optimization — a policy gradient RL algorithm that constrains policy updates to a trust region using a clipped surrogate objective. PPO is the default RL algorithm for robot locomotion (legged robots, humanoids) and sim-to-real transfer due to its stability, simplicity, and sample efficiency. It balances exploration and exploitation without the computational cost of TRPO's constrained optimization.