Glossary

TRPO

Trust Region Policy Optimization — a policy gradient RL algorithm that constrains each policy update to a trust region (measured by KL divergence) to prevent large, destructive updates. TRPO provides monotonic improvement guarantees but is computationally expensive due to the constrained optimization. PPO approximates TRPO's benefits with a simpler clipped surrogate objective.

See this in practice: our real-world evals →

Explore More Terms

Browse the full robotics glossary.

Back to Glossary