HomeGlossaryTop-K Sampling
Inference & Optimization

Top-K Sampling

A text generation strategy that restricts sampling to the K most likely next tokens at each step.

Top-K sampling is a decoding strategy where the model only considers the K tokens with the highest probability at each generation step, then samples from those K according to their relative probabilities.

It strikes a balance between greedy decoding (always pick the single best token) and full sampling (consider all tokens). Small K values make output more deterministic; larger K values increase diversity.

Typical values: K=40 to K=100 for general generation. K=1 is equivalent to greedy decoding.

Top-K is often combined with top-P sampling and temperature for fine-grained control over generation. Modern LLMs typically use top-P as the primary cutoff.

Related Terms

← Back to Glossary