Learning Rate

The step size used by gradient descent to update model parameters. Too high causes divergence; too low causes slow convergence. Learning rate scheduling (warmup + decay) is critical for stable training. Typical initial learning rates: 1e-4 for fine-tuning pre-trained models, 3e-4 for training from scratch with Adam. The learning rate is often the most important hyperparameter to tune.

MLTraining

Explore More Terms

Browse the full robotics glossary.

Back to Glossary