Online RL
Reinforcement learning where the agent actively interacts with the environment, collecting new transitions and updating its policy in real time. Online RL can achieve higher performance than offline RL by exploring regions not covered in static datasets, but requires safe exploration mechanisms in physical robot settings. Sim-to-real transfer is often used to safely conduct the online RL phase in simulation.