9. Context Engineering — The Real Discipline

  ┌─────────────────────────────────────────────────────┐
  │         CONTEXT WINDOW (e.g. 200K tokens)           │
  │                                                     │
  │  ┌─────────────────────────────────────────────┐    │
  │  │  System prompt                              │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Tool definitions                           │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Examples                                   │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Retrieved documents                        │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Images / file attachments                  │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Conversation history                       │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Tool results from previous steps           │    │
  │  ├─────────────────────────────────────────────┤    │
  │  │  Current user message                       │    │
  │  └─────────────────────────────────────────────┘    │
  │                                                     │
  │  ·················································  │
  │  ············· free space for output ·············  │
  │  ·················································  │
  │                                                     │
  └─────────────────────────────────────────────────────┘

Everything we've covered — system prompt, retrieved documents, images, tool definitions, conversation history, thinking tokens — competes for the same limited space in the context window. The model also has its trained knowledge (the parameters from section 1), but at runtime, the context window is the only input you can control. If a fact isn't in the window and wasn't in the training data, it doesn't exist for the model.

All of the machinery we've seen so far — the conversation management from section 3, the tool execution from section 6, the agentic loop from section 7 — is built by developers, not the LLM. This is the application around the model: everything except the LLM itself. The LLM is the engine; the application is the rest of the car. Engineers use several overlapping terms for this layer — "application layer," "orchestration layer," "harness," sometimes "the stack." They all describe aspects of the same basic idea: code that surrounds and directs the model.

Context engineering is the discipline of controlling what the model sees on every call — and what it doesn't. Concretely, it means designing across four dimensions:

Why this matters in practice:

What's commonly called "Prompt Engineering" — techniques like providing examples, chain-of-thought reasoning, or careful phrasing — is real and useful. But for most AI products, the prompt you type is a small fraction of what determines output quality. The rest is system prompts, retrieved documents, tool definitions, conversation history, and thinking tokens — all managed automatically by the application around the model. That's why the more precise term is context engineering: it's not just about your prompt, it's about everything the model sees.

← 8. Multi-Agent
What We Didn't Cover →