TRPO

Trust Region Policy Optimization — a policy gradient RL algorithm that constrains each policy update to a trust region (measured by KL divergence) to prevent large, destructive updates. TRPO provides monotonic improvement guarantees but is computationally expensive due to the constrained optimization. PPO approximates TRPO's benefits with a simpler clipped surrogate objective.

RL

Explore More Terms

Browse the full robotics glossary.

Back to Glossary