Gradient Descent
An optimization method that updates model parameters in the direction that most reduces prediction error.
Gradient descent is the optimization process used to improve a model during training. Once gradients have been computed, the optimizer updates each parameter in the direction that reduces the model's error.
The name comes from the idea of moving downhill on an error landscape. If the current model state is a point on that landscape, gradient descent follows the slope downward toward better-performing parameter values.
Common Variants
- Batch gradient descent β uses the entire dataset for each update
- Stochastic gradient descent β updates using one example at a time
- Mini-batch gradient descent β uses small groups of examples; standard in practice
- Adam / AdamW β popular adaptive optimizers used in LLM training
Gradient descent must be tuned carefully. Step too aggressively and training becomes unstable. Step too slowly and learning takes too long. This is why settings like learning rate schedules are so important in large-scale AI training runs.