Context Window

AI & MACHINE LEARNING

Quick Definition

The context window is the total number of tokens a language model can hold in attention at any given moment. It is the hard upper bound on how much text the model can read in a single request. Modern frontier models have context windows ranging from 8K (older or smaller models) to 1M+ tokens (Claude with extended context, Gemini 1.5 Pro, GPT-4.1). Everything counts: the system prompt, the user's message, prior conversation turns, retrieved documents, tool definitions, and the response itself.

How it works

Tokens are subword units the model uses internally. Roughly: 1 token ≈ 0.75 English words. A 128K context window holds about 96,000 words, or 200 pages of text. The model's compute and memory cost grows roughly quadratically with context length, which is why long-context models are more expensive per request.

When the input exceeds the context window, the application must truncate, summarize, or use RAG to inject only the most relevant subset. Most agent frameworks include automatic context management that keeps recent messages and a rolling summary of older ones.

Why it matters

Context window size determines what kinds of tasks an LLM can do in a single shot. Small windows force aggressive summarization and lose nuance. Large windows let you drop entire codebases, books, or transcripts into a prompt. The 1M-token threshold (achieved 2024-2025) makes whole-document analysis tractable in one call.

Where you'll see this on TerminalFeed

The AI Agents article on the TerminalFeed blog covers how agents manage context as conversations grow longer than any single context window.