Policy Distillation
Compressing a large or complex policy (teacher) into a smaller, faster policy (student) by training the student to match the teacher's action distribution via KL divergence minimization. Policy distillation is used to: compress RL policies for real-time deployment, transfer from simulation to real hardware, and combine multiple specialized policies into one.
Robot LearningML