Data Augmentation

Techniques that expand training data by creating modified versions of existing examples — like rotating images or paraphrasing text.

Data augmentation artificially increases dataset size by applying label-preserving transformations to existing examples. For images this includes rotations, crops, color jitter, and flips. For text it includes paraphrasing, word substitution, and back-translation.

Augmentation acts as a powerful regularizer by exposing the model to more variations of the same underlying concepts. It's especially valuable when labeled data is limited.

Rule of thumb: any transformation that preserves the correct label is valid. A rotated cat is still a cat.

Modern image training uses aggressive augmentation pipelines. Text augmentation is trickier since small changes can alter meaning, but techniques like back-translation and LLM paraphrasing are common.

Related Terms

← Back to Glossary