Glossary

Context Window

Definition

A context window is the maximum amount of text an AI model can process in a single interaction, measured in tokens. It determines how much information — instructions, documents, conversation history, and data — the model can reference when generating a response.

A context window is the total amount of information an AI model can hold in active memory during a single interaction, measured in tokens. Everything the model can see and reference when generating a response — the system instructions, retrieved documents, conversation history, user input, and any structured data — must fit within this limit. Information outside the context window is invisible to the model for that interaction.

Understanding context window limits is essential for building reliable AI workflows. A model that can’t fit all the relevant information into its context will produce outputs that are generic, incomplete, or inconsistent with the full picture.

How does a context window work?

A context window works like a fixed-size workspace: everything the model needs to do its job must be placed on the desk before work begins. One token is roughly 0.75 words in English, so a 128,000-token context window holds approximately 96,000 words — around 300 pages of text. Anthropic’s Claude models support up to 200,000 tokens, or roughly 150,000 words, which is large enough to process a full-length business report, a complete email history with a client, or a substantial product documentation set in a single pass.

When a context window fills up, something must give. Systems either truncate the oldest content (cutting off earlier conversation history), exclude lower-priority documents, or use techniques like Retrieval-Augmented Generation (RAG) to dynamically load only the most relevant content at the moment it’s needed — keeping the context focused and within limits.

Why does context window size matter for small businesses?

Context window size matters for small businesses because it determines what an AI agent can actually see and use when doing a task. An agent writing a client proposal needs access to the client’s history, your service catalogue, any relevant past work, and the current brief — all at once. If those documents collectively exceed the model’s context window, the agent is working blind on part of the picture.

Larger context windows reduce the need for complex retrieval engineering. A model that can load an entire client folder without hitting token limits will produce more coherent, consistent outputs than one that must selectively retrieve fragments and piece them together. According to Anthropic’s 2024 model release documentation, increasing context window capacity was one of the most-requested improvements from enterprise customers — teams running document-heavy workflows saw immediate quality improvements when moving from 32K to 100K+ token models.

What is the difference between a context window and memory?

A context window is temporary — it resets between interactions unless explicitly managed. Memory refers to systems built on top of the context window to persist information across sessions: writing key facts to a database, retrieving them at the start of each new session, and inserting them into the context. Memory is an architectural pattern; the context window is the fundamental constraint that makes memory necessary.

Context WindowMemory Systems
DurationSingle interactionPersists across sessions
MechanismNative to the modelBuilt on top — database + retrieval
LimitFixed (128K–200K tokens)Effectively unlimited
ExampleActive conversationClient history stored in a database

FAQ

What is a context window in AI?

A context window is the maximum amount of text an AI model can read and use in one interaction, measured in tokens. About 750 words equals roughly 1,000 tokens.

Why does context window size matter?

A larger context window lets the model read longer documents, more conversation history, or more retrieved data — producing more accurate and relevant responses.

What happens when you exceed the context window?

Content that doesn't fit is either truncated or excluded. The model can't reference information outside its active context, which can cause incomplete or inaccurate outputs.

How big are current AI context windows?

Current frontier models support 128,000 to 200,000 tokens. Anthropic's Claude models support up to 200,000 tokens — roughly 150,000 words or a full-length novel.