Policy (robot)
In robot learning, a policy (denoted π) is a function that maps observations to actions: π(o) → a. The policy is the learned "brain" of the robot that determines what to do at every timestep given what it perceives. Policies can be represented as neural networks (neural policies), decision trees, Gaussian processes, or lookup tables. They can be deterministic (one action per observation) or stochastic (a distribution over actions). Policy quality is measured by task success rate across diverse conditions, not just on training demonstrations. The core challenge of robot learning is training policies that generalize reliably beyond their training distribution.