Gradient Clipping
Capping the magnitude of gradients during training to prevent exploding gradients that can destabilize optimization. Gradient norms are clipped to a maximum value (typically 1.0–10.0). Gradient clipping is standard practice for training transformers, RNNs, and RL policies where reward variance can cause large gradient spikes.