Models & Architecture
Positional Encoding
A technique for injecting information about token positions into transformer models, which otherwise have no notion of order.
Transformers process tokens in parallel and have no inherent understanding of word order. Positional encoding adds order information to each token's embedding so the model knows where each token sits in the sequence.
The original transformer used sinusoidal positional encodings — fixed patterns of sine and cosine functions at different frequencies. Later models introduced learned positional embeddings and relative position encodings.
Modern approach: Rotary Position Embedding (RoPE) encodes positions by rotating query/key vectors, working better with long contexts and extrapolation.
Positional encoding choice affects how well a model handles long contexts and out-of-distribution positions. RoPE and ALiBi are now standard in most production LLMs.