RAG & Retrieval

Reranking

A second-stage retrieval step that reorders initial search results using a more accurate but slower model to improve relevance.

Reranking is a two-stage retrieval pattern. A fast first-stage retriever (like vector search) fetches many candidate documents, then a slower but more accurate reranker model reorders them based on relevance to the query. Only the top few reranked results are passed to the LLM.

The reranker is typically a cross-encoder that scores query-document pairs directly, rather than relying on pre-computed embeddings. This captures relationships that bi-encoder embeddings miss.

Why it matters: pure vector search can return semantically similar but irrelevant results. Reranking catches these errors.

Popular rerankers include Cohere Rerank, BGE Reranker, and Jina Reranker. Adding a reranker to a RAG pipeline is one of the highest-leverage quality improvements — often boosting accuracy by 10-20% on retrieval benchmarks.

Related Terms

← Back to Glossary