Cross-Attention

An attention mechanism where queries from one sequence attend to keys and values from a different sequence.

Cross-attention lets one sequence condition on another. Unlike self-attention where a sequence attends to itself, cross-attention has queries from sequence A attending to keys and values from sequence B.

It's essential in encoder-decoder models like the original transformer (where the decoder attends to encoder outputs) and in multimodal models (where text attends to image features).

Use cases: neural translation (target attends to source), image captioning (text attends to image), RAG (generation attends to retrieved context).

Many modern architectures use cross-attention to fuse information across modalities — text with images, audio with text, or structured data with natural language.

Related Terms

← Back to Glossary