A token is a subword piece of text that a language model treats as a single unit. The tokenizer (a small algorithm that runs before the model) splits raw input text into tokens using a vocabulary of typically 50K-200K pieces. Common English words are usually one token; rare words split into multiple. "Hello world" is 2 tokens. "antidisestablishmentarianism" might be 6. Code, URLs, and non-English text generally use more tokens per character.
Each token in the vocabulary maps to a numerical ID. The model only sees IDs, never raw text. When generating, the model outputs a probability distribution over all vocabulary tokens at each step, picks one (with some sampling strategy), appends it to the sequence, and repeats. Output is detokenized back into text at the end.
Token count drives almost everything in LLM economics: prompt cost, generation cost, latency, and context window usage. A typical pricing model might charge $3 per million input tokens and $15 per million output tokens.
Tokens are the unit you actually pay for and the unit that fills your context window. Optimizing prompt length, choosing efficient tokenizers, and trimming output verbosity all map directly to cost and latency.
The Free Tier Is Dead article touches on token economics for self-hosted vs hosted LLMs.