Context Window
The maximum amount of text (measured in tokens) that an AI model can process in a single interaction.
The context window is the total amount of text — measured in tokens — that a language model can "see" and work with at one time. This includes the system prompt, the entire conversation history, any documents you paste in, and the model's output. When the context window fills up, older content is dropped.
Context window size has grown dramatically: GPT-3 had 4K tokens (~3,000 words), GPT-4 Turbo reached 128K tokens, and models like Gemini 1.5 Pro support up to 1 million tokens — enough for an entire codebase or 10 novels. Larger context windows enable more complex tasks but require more memory and computation.
Why Context Window Matters
- Determines how long a conversation can be before history is lost
- Limits how much document content you can provide for analysis
- Affects cost — more tokens in = higher API usage fees
- Long-context models can do tasks impossible with short windows
For tasks exceeding the context window, retrieval-augmented generation (RAG) is the standard solution. Instead of loading all documents into context, a retrieval system finds the most relevant chunks and injects only those. This keeps costs manageable and performance high even for large knowledge bases.