RAG & Retrieval
Chunking
The process of splitting long documents into smaller pieces that fit into a language model's context window.
Chunking breaks long documents into smaller passages so they can be embedded, retrieved, and fed to an LLM one at a time. Chunk size is a critical RAG hyperparameter — too small and you lose context, too large and retrieval becomes noisy.
Common strategies include fixed-size chunking (by tokens or characters), sentence-based splitting, recursive character splitting, and semantic chunking that groups related content together.
Typical sizes: 256-1024 tokens per chunk, often with 10-20% overlap between chunks to preserve context at boundaries.
Advanced systems use hierarchical chunking, where small chunks are embedded but their parent sections are retrieved when relevant. Chunking quality often matters more than the choice of embedding model.