Models & ArchitectureBidirectional Encoder Representations from Transformers

BERT

Bidirectional Encoder Representations from Transformers — a Google-built language model that reads text in both directions.

BERT, released by Google in 2018, revolutionized NLP by introducing bidirectional pretraining. Unlike GPT-style models that read left-to-right, BERT reads the whole sentence at once, making it especially strong at understanding tasks like classification and entity recognition.

BERT pioneered the "pretrain-then-fine-tune" paradigm that dominates modern NLP. It's trained with masked language modeling — randomly masking tokens and asking the model to predict them from surrounding context.

Impact: BERT beat state-of-the-art on 11 NLP tasks at launch and launched the transformer era in NLP. It's still widely used for embeddings and classification today.

Variants like RoBERTa, ALBERT, and DistilBERT improved on the original. BERT-style encoders power search ranking, semantic search, and sentence embeddings across countless production systems.

Related Terms

← Back to Glossary