Inference & OptimizationNucleus Sampling
Top-P Sampling
A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds P.
Top-P (nucleus) sampling dynamically adjusts the candidate pool based on probability mass rather than a fixed count. The model keeps the smallest set of tokens whose combined probability is at least P, then samples from that set.
This is more adaptive than top-K: at one step the model might sample from 5 tokens, at another from 50, depending on how confident the distribution is.
Typical values: P=0.9 or P=0.95. Lower values produce more focused, predictable text; higher values produce more diverse output.
Top-P is often preferred over top-K for modern LLM generation because it adapts to the model's uncertainty. It's combined with temperature to control overall randomness.