Policy Gradient

A family of RL algorithms that directly optimize the policy parameters by estimating the gradient of the expected return with respect to policy parameters. REINFORCE, PPO, TRPO, and SAC all belong to this family. Policy gradient methods work with continuous action spaces (unlike value-based methods like DQN) and are the standard RL approach for robot control tasks.

Robot LearningRL

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary