Reranking
A second-stage retrieval step that reorders initial search results using a more accurate but slower model to improve relevance.
Reranking is a two-stage retrieval pattern. A fast first-stage retriever (like vector search) fetches many candidate documents, then a slower but more accurate reranker model reorders them based on relevance to the query. Only the top few reranked results are passed to the LLM.
The reranker is typically a cross-encoder that scores query-document pairs directly, rather than relying on pre-computed embeddings. This captures relationships that bi-encoder embeddings miss.
Popular rerankers include Cohere Rerank, BGE Reranker, and Jina Reranker. Adding a reranker to a RAG pipeline is one of the highest-leverage quality improvements — often boosting accuracy by 10-20% on retrieval benchmarks.