Language & Text

Token

The basic unit of text processed by a language model, often representing a word, subword, punctuation mark, or symbol.

A token is the atomic unit a language model reads and predicts. Tokens are not always full words. Depending on the tokenizer, a token might be a complete word, part of a word, punctuation, or even whitespace.

For example, the sentence "AI is useful." may be split into a handful of tokens like "AI", " is", " useful", and ".". This token-level view is how models measure prompt size, output length, and API costs.

Practical rule: Tokens are the unit that matters for both pricing and capacity. More tokens means more cost and more context window usage.

Where Tokens Matter

Prompt limits — context windows are measured in tokens
Billing — API providers usually charge per token
Latency — more generated tokens means longer responses
Model behavior — token boundaries affect text handling

Understanding tokens is essential for anyone building with AI APIs. It explains why long prompts cost more, why code-heavy inputs can be expensive, and why tokenization choices matter in production systems.

Related Terms

← Back to Glossary