What is Context Engineering

Joni JuupNovember 14, 2025

Designing effective AI implementations and experiences starts from understanding context and how LLMs work.

Large language models are text completion engines at their core. They continue from whatever context you give them, predicting the most likely next tokens. Context is everything the model sees before it answers: system prompts, chat history, tool results, documents, UI metadata.

Context engineering is designing everything the model sees before it generates a response. In other words, effective AI implementations focus on building high-quality context that leads to high-quality outputs.

Text in, text out

The first GPT models were simple text-in text-out systems. You gave them a text and the model generated a continuation of that text.

Using Early LLMs

You are a poet that creates short poems.

User: Write a poem about autumn.

AI: The leaves fall gently to the ground, Painting the earth in hues profound. The crisp air whispers through the trees, A symphony of nature's ease.

In the early days, people built clever prompt templates to guide the model's output. But fundamentally, you tried to build a chain of text that led the model to produce the desired result. This principle still holds true with modern LLMs.

LLMs are probabilistic text generators at their core that predict the next token based on the preceding context. It means each token or word you give as input or in system prompt can affect what comes next.

Note: below is a simplified demo of how token prediction works. In real life tokens are pieces of words, punctuation, or spaces, but here we use whole words for clarity. Watch this for a more in-depth dive in transformers.

Token Prediction Chain

I love to eat

pizza

28%

chocolate

22%

healthy

20%

pasta

18%

vegetables

12%

How chat models work

The key point: under the hood it's still text-in text-out. We still want to build a chain of context that leads the model to produce the desired output.

Using Modern LLMs

What the LLM sees

<|system|> You are a helpful assistant that writes poetry.</s>

<|user|> Write a poem about autumn.</s>

<|assistant|> The leaves fall gently to the ground, Painting the earth in hues profound. The crisp air whispers through the trees, A symphony of nature's ease.</s>

What the user sees

Write a poem about autumn.

The leaves fall gently to the ground, Painting the earth in hues profound. The crisp air whispers through the trees, A symphony of nature's ease.

Tools, MCP, and the application layer

Tool use (function calling) is the big shift that made LLMs far more useful beyond raw text generation. When an LLM uses tools like web search, code execution, APIs, it is not directly browsing the web or calling APIs.

Instead, the model generates text commands that the application layer interprets, executes and feeds back into the model as more context.

The model generates a text command that matches a schema you define.

Your application parses that command, runs the real tool / API request.

You feed the result back into the model as more context.

The model continues the completion with this new information.

Tools: These are structured definitions of what external capabilities the model can invoke. Each tool has a name, description, input schema, and output schema. An LLM might have multiple tools available, like "web_search", "get_current_temperature", or "execute_code" and choose which one to call based on the user's request.

Application Layer: This is the app where you call the LLM API, manage context, parse model outputs, execute tool calls, and handle the overall workflow. The application layer interprets the model's text commands, runs the actual tools or APIs, and feeds results back into the model. This is the part you are most likely building.

Model Context Protocol (MCP) turns application APIs into tool definitions. You can think of MCP as the UI of apps for LLMs: instructions that let models reliably talk to external systems. For example a project management app might expose tools like "create_task", "list_tasks", and "update_task_status" with clear schemas the LLM can use. This is also how you build the upcoming ChatGPT Apps that let's ChatGPT use third party apps directly from the chat interface.

Tool Use

What the LLM sees

<|system|>

You are a bot that responds to weather queries. You should reply with the unit used in the queried location. Available tools: [{"type": "function", "function": {"name": "get_current_temperature", "description": "Get the current temperature at a location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "City, Country"}, "unit": {"type": "string", "description": "celsius or fahrenheit"}}, "required": ["location", "unit"]}}}]

</s>

<|user|>

Hey, what's the temperature in Paris right now?

</s>

<|assistant|>

<tool_call> {"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"} </tool_call>

</s>

<|tool|get_current_temperature|>

</s>

<|assistant|>

The temperature in Paris, France right now is 22°C.

</s>

What the user sees

Hey, what's the temperature in Paris right now?

Tool execution in application

Returned: 22

The temperature in Paris, France right now is 22°C.

How it works:

The system message provides context and instructions to the model
The user asks a natural language question
The assistant generates a <tool_call> with structured JSON parameters
Your application executes the tool and returns the result as a tool message
The assistant reads the tool result and formats it into a natural response for the user

Note: The model doesn't execute tools itself. It requests tool calls, and your application handles the execution and returns results.

So again, what is Context Engineering?

It's about assembling the right information, tools, and constraints at the right moment. Every time the model runs, you're building a temporary workspace that includes:

System prompt: role, rules, examples
User message: the current request
Chat history: recent conversation
Retrieved information: relevant documents or data
Available tools: what actions the model can take
Output format: schemas guiding the response structure

Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates. It is about the engineering of context and providing the right information and tools, in the right format, at the right time. It's a cross-functional challenge that involves understanding your business use case, defining your outputs, and structuring all the necessary information so that an LLM can "accomplish the task."

—

Phil Schmid / "The New Skill in AI is Not Prompting, It's Context Engineering"

Most AI failures aren't model failures. They're context failures. The model didn't have the right information, constraints, or tools available when it needed them.

Context windows are limited. It means the maximum number of tokens a model can process at once. Each time you send a message to a chat model, the entire context (system prompt, chat history, tool results, documents) is included and counts against that limit.

Even large context windows suffer from "attention dilution", meaning that if you pack in too much information, the model loses focus. Good context engineering means including only what's needed, using compressed summaries over raw data, and retrieving information just-in-time rather than loading everything upfront.

High-quality context leads into high-quality output

LLMs don't remember, don't learn on the fly, and don't run code by themselves. Every time you send data to an LLM, it's a fresh temporary workspace that requires all necessary information to be included in the context. Effective context engineering involves:

Selecting what goes into context: Prompts, history, documents, tool results—everything the model needs to generate accurate responses.
Structuring it clearly: Organize context so the model can follow the logic and produce coherent outputs.
Constraining outputs: Use schemas and tools to guide the model toward producing reliable, structured results.

On Retrieval Augmented Generation (RAG)

When you're working with large information sources like a knowledge base or documentation set, you can't include all the information in each request (because of the context window and well, costs). RAG initially became relevant to augment the model's original training data and to ground it's answers in this retrieved data.

Old RAG implementations retrieved context for every user query but these days most implementations are agentic: the model uses search as a tool and decides when it needs more information and what to search for.

RAG (Retrieval Augmented Generation)

What the LLM sees

<|system|>

You are a helpful assistant with access to a knowledge base. Use the search_knowledge_base tool to find relevant information before answering questions. Available tools: [{"type": "function", "function": {"name": "search_knowledge_base", "description": "Search the knowledge base for relevant information", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}}, "required": ["query"]}}}]

</s>

<|user|>

What are the main benefits of using TypeScript?

</s>

<|assistant|>

<tool_call> {"arguments": {"query": "TypeScript benefits advantages"}, "name": "search_knowledge_base"} </tool_call>

</s>

<|tool|search_knowledge_base|>

[ {"id": "doc-12", "relevance": 0.89, "content": "TypeScript provides static type checking, which catches errors at compile time rather than runtime. This leads to more reliable code and better developer experience."}, {"id": "doc-34", "relevance": 0.85, "content": "The main advantages of TypeScript include: improved code maintainability, better IDE support with autocomplete and refactoring tools, and enhanced collaboration in large teams."}, {"id": "doc-56", "relevance": 0.72, "content": "TypeScript's type system enables safer refactoring and makes it easier to understand code written by others."} ]

</s>

<|assistant|>

TypeScript offers several key benefits: 1. **Static Type Checking**: Catches errors at compile time instead of runtime, making your code more reliable [1] 2. **Better Developer Experience**: Enhanced IDE support with autocomplete and refactoring tools [2] 3. **Improved Maintainability**: Easier to understand and modify code, especially in large codebases [2] 4. **Team Collaboration**: Type definitions serve as documentation, making it easier for teams to work together [2][3] [1] doc-12 [2] doc-34 [3] doc-56

</s>

What the user sees

What are the main benefits of using TypeScript?

Searching knowledge base

Found 3 relevant documents

doc-12 (89% match)

doc-34 (85% match)

doc-56 (72% match)

TypeScript offers several key benefits:

Static Type Checking: Catches errors at compile time instead of runtime, making your code more reliable ^[1]
Better Developer Experience: Enhanced IDE support with autocomplete and refactoring tools ^[2]
Improved Maintainability: Easier to understand and modify code, especially in large codebases ^[2]
Team Collaboration: Type definitions serve as documentation, making it easier for teams to work together ^[2][3]

[1] doc-12

[2] doc-34

[3] doc-56

How RAG works (simplified):

The user asks a question
The assistant decides to search the knowledge base and generates a <tool_call> with a search query
Your application searches the vector database/knowledge base and returns relevant documents with relevance scores
The retrieved documents are added to the context as a tool message
The assistant reads the retrieved information and "decides" if it needs to search again or not.
The assistant synthesizes an answer based on the actual documents.

The challenge is building an effective search system that finds relevant information across different query types. If your project relies heavily on knowledge retrieval, start by building a solid search infrastructure with AI agents as a primary consumer. AI agents can query data in multiple ways: keyword search, semantic search, filtered queries, or even executing code to compute results.

AI implementations needs good user experience (UX)

Consider a scenario: you are using an AI assistant to get medical advice. The model provides an answer that seems plausible, but you have no idea where that information came from. Is it based on current medical guidelines, outdated information, or fabricated text?

Good AI UX builds trust through transparency and verifiability. LLMs can produce incorrect or misleading information even when functioning as intended, so design interfaces that help users verify outputs and recover from errors.

Be transparent: show the process/context that lead to an answer.
Make it easy to verify citations: ideally in-line, not just tiny footnote links that require reading a 20 page document.
Use friction wisely: for example on critical actions, require user confirmation before execution.
Use streaming outputs to let users see what is happening in real-time and being able to interrupt if something seems off.

Not everything needs to be a chat interface either. There are much to be explored in terms of intentfaces and different patterns of interaction that suit different use cases better than chat.

How to build intuition about LLMs and context engineering

Getting hands-on experience is the best way to understand how LLMs work and how to design effective context. Here are some steps to get started:

Learn the basics: Understand what structured outputs and tool calling are. They're the backbone of robust AI products.
- If you have any development skills, you can try our basic tool calling template to build a chatbot using Vercel AI SDK. It's very bare bones but shows the core principles. We use it as an assignment in our recruitment process.
Learn context engineering: Learn how to shape prompts, chain tools, and design flows that produce stable results. Start from guides like Anthropic's context engineering article or OpenAI's prompt engineering guide. There's also Google's prompt engineering guide.
- Nothing beats direct experience, so apply prompt engineering techniques in real projects.
- You can also use LLMs to help you learn prompt engineering. For example, ask an LLM to critique and improve your prompts.
Try as many different GenAI products as you can: Spot patterns in how they structure context and workflows. Direct experience builds intuition.
- Try different chat apps like ChatGPT, Claude and Gemini. Try to solve something for yourself and see how they manage web search, memory, deep research, and tool use in general.
- AI app builders like Lovable, V0, Figma Make.
- If you are code-curious or already have development chops: look at Github Copilot, Claude Code and ChatGPT Codex.

For more in-depth LLM insights see Andrej Karpathy's lectures or watch 3Blue1Brown's Transformers, the tech behind LLMs. You don't need to understand all the math behind LLMs, but having a mental model of how they work under the hood helps you design better context and prompts.

Over time you'll see that everything is context building: your prompt, the app's hidden prompts, the tools it calls, and the data you feed it. Use that understanding to form your own opinion, rather than echoing hype or skepticism. For more practical insights on building AI products, Simon Willison's writing offers excellent real-world perspectives.

To sum it up

Context makes or breaks the AI implementations, and you only really understand how context works (prompts, structures, tools) by actually using and building with them.

The gap between reading about AI and working with it directly is enormous. Theory gives you vocabulary, but practice gives you intuition about what works, what fails, and why. You also get a sense of different models' strengths and weaknesses, and how to design around those. So start experimenting today. Build small things. Break them. Fix them. Build for yourself. Solve a problem you have.

Build your own context.

What is Context Engineering

Text in, text out

How chat models work

Tools, MCP, and the application layer

How it works:

So again, what is Context Engineering?

High-quality context leads into high-quality output

On Retrieval Augmented Generation (RAG)

How RAG works (simplified):

AI implementations needs good user experience (UX)

How to build intuition about LLMs and context engineering

To sum it up

Continue reading

Agents, Workflows, and Tools: Navigating Through the Hype→

Thoughts on the Model Context Protocol Part 2→

Thoughts on the Model Context Protocol→

Synthetic Data in 2025: Revolutionizing GenAI Model Performance→

Optimizing current business vs. unlocking new business opportunities using GenAI→

Founder Conversations: A Week of LLM Insights in the Bay Area→

The HARG Truth: AI's Need for the Human Element→

Intentface: Human-Centric Computing Through Intent-Driven Interactions→