Gradient Descent

An optimization method that updates model parameters in the direction that most reduces prediction error.

Gradient descent is the optimization process used to improve a model during training. Once gradients have been computed, the optimizer updates each parameter in the direction that reduces the model's error.

The name comes from the idea of moving downhill on an error landscape. If the current model state is a point on that landscape, gradient descent follows the slope downward toward better-performing parameter values.

Simple intuition: measure how wrong the model is, figure out which direction improves it, then take a small step that way. Repeat millions of times.

Common Variants

Batch gradient descent — uses the entire dataset for each update
Stochastic gradient descent — updates using one example at a time
Mini-batch gradient descent — uses small groups of examples; standard in practice
Adam / AdamW — popular adaptive optimizers used in LLM training

Gradient descent must be tuned carefully. Step too aggressively and training becomes unstable. Step too slowly and learning takes too long. This is why settings like learning rate schedules are so important in large-scale AI training runs.

Related Terms

← Back to Glossary