Top-P Sampling

A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds P.

Top-P (nucleus) sampling dynamically adjusts the candidate pool based on probability mass rather than a fixed count. The model keeps the smallest set of tokens whose combined probability is at least P, then samples from that set.

This is more adaptive than top-K: at one step the model might sample from 5 tokens, at another from 50, depending on how confident the distribution is.

Typical values: P=0.9 or P=0.95. Lower values produce more focused, predictable text; higher values produce more diverse output.

Top-P is often preferred over top-K for modern LLM generation because it adapts to the model's uncertainty. It's combined with temperature to control overall randomness.

Related Terms

← Back to Glossary