Models & Architecture
Cross-Attention
An attention mechanism where queries from one sequence attend to keys and values from a different sequence.
Cross-attention lets one sequence condition on another. Unlike self-attention where a sequence attends to itself, cross-attention has queries from sequence A attending to keys and values from sequence B.
It's essential in encoder-decoder models like the original transformer (where the decoder attends to encoder outputs) and in multimodal models (where text attends to image features).
Use cases: neural translation (target attends to source), image captioning (text attends to image), RAG (generation attends to retrieved context).
Many modern architectures use cross-attention to fuse information across modalities — text with images, audio with text, or structured data with natural language.