Training
Loss Function
A mathematical measure of how wrong a model’s predictions are during training.
A loss function converts model error into a number. The higher the loss, the worse the model is performing on a given example or batch. Training aims to minimize this value over time.
Different tasks use different loss functions. Classification models often use cross-entropy loss, regression models may use mean squared error, and language models optimize next-token prediction loss across sequences of tokens.
Why it matters: The loss function defines what "good performance" means during training. It is the signal the model uses to learn.
What Loss Functions Do
- Quantify error — turn prediction quality into a numeric objective
- Guide optimization — provide a target for gradient descent
- Enable comparison — track whether training is improving
- Shape behavior — different losses encourage different outcomes
Watching both training loss and validation loss is important. If training loss keeps improving while validation loss gets worse, the model may be drifting into overfitting.