Context Engineering for AI Agents

Introduction: Why Your AI Agent Gives Bad Answers

Context engineering is the discipline of assembling the right information — deployment state, active incidents, code ownership, historical decisions, and work-in-progress status — so AI agents can make accurate, actionable decisions. RAG (retrieval-augmented generation) alone fails for engineering tasks because vector search retrieves semantically similar documents, not task-relevant context. Effective context engineering combines structured context (2,000–8,000 tokens per request) with tiered refresh strategies: real-time for deployment state, cached for architecture, and static for historical decisions.

Building Glue taught me this lesson the hard way. Our first agent prototype had RAG over the codebase — vector embeddings, semantic search, the works. It could find relevant files in milliseconds. But it still gave bad answers because it had no idea what happened yesterday — the breaking deploy, the three Sentry errors, the Slack thread where the team decided to deprecate that API. Context isn't just code. It's everything happening around the code.

You've deployed an AI agent into your engineering workflow. It has access to your codebase. It has vector embeddings of your documentation. It can retrieve relevant files within milliseconds.

And yet it still gives you bad answers.

It doesn't know that your team deployed a breaking change last night. It doesn't see that three critical errors spiked in the last hour in production. It has no idea that your sprint goal is to ship the payment service by Friday, or that Sarah owns the authorization module, or that your team just made the decision to deprecate the legacy API—in a Slack thread it can't access.

The problem isn't the AI. The problem is context.

Most organizations treat AI agents like static knowledge bases: feed them documents, index them, retrieve them. But engineering work isn't static. It's chaotic, distributed, and constantly changing. Your agent is being asked to help optimize a system that evolves every day—but it's only seeing yesterday's snapshot.

This is where context engineering comes in.

Context engineering is the discipline of assembling the right, timely information for AI agents to act on. It's the difference between an agent that's useful and one that wastes your time. It's why some teams see AI agents multiply their productivity, while others see them as an expensive chatbot.

This post breaks down what context engineering is, why RAG alone falls short, the types of context engineers actually need, and how to build a context layer that keeps your agents in sync with reality.

What Is Context Engineering?

Context engineering is the practice of deliberately assembling and structuring information so that AI agents can reason about it effectively.

Unlike traditional software engineering, where context is implicit (developers know the system because they built it), AI agents have no background knowledge. They can't assume anything. Everything they need to know must be provided—explicitly.

Context engineering answers the question: What information does this agent need, in what format, at what time, to make a good decision?

It's not just about raw data volume. A 10,000-document dump is worthless if the agent can't distinguish between deprecated guidance and current best practices. What matters is:

Relevance: Is this information actually needed for this specific task?
Timeliness: Is this current, or three sprints old?
Clarity: Is this information structured in a way the agent can reason about?
Completeness: Does the agent have the full picture, or is it missing critical dependencies?

Context engineering spans three layers:

Data collection — What information exists in your systems?
Data integration — How do you pull it together into a unified model?
Context assembly — How do you select, filter, and rank the right pieces for each request?

Organizations that skip this work treat AI agents like they treat junior developers on day one: throw them at the problem and hope for the best. Organizations that invest in context engineering treat AI agents like they treat senior engineers: give them the full context they need to make informed decisions.

Why RAG Falls Short for Engineering Agents

Retrieval-Augmented Generation (RAG) is powerful. It solved a real problem: how to ground LLMs in domain-specific knowledge without fine-tuning.

For many applications—customer support, Q&A over documentation, legal contract analysis—RAG is sufficient. You retrieve relevant documents, pass them to the LLM, and it answers questions about them.

But engineering work breaks RAG's assumptions in three fundamental ways.

First: Engineering context is distributed and heterogeneous.

Your codebase lives in GitHub. Your infrastructure decisions live in Terraform. Your deployment history lives in your CI/CD system. Your incident response lives in PagerDuty. Your sprint planning lives in Jira. Your architectural decisions live in an RFC document from 2022 that nobody can find. Your team's tribal knowledge lives in Slack threads.

RAG assumes you can retrieve from a single, searchable corpus. But engineering teams don't have a single corpus. They have a dozen disconnected systems, each with different APIs, permissions models, and data schemas.

Even if you somehow indexed all of it, classical RAG (vector similarity) doesn't know how to weight this information correctly. Is the README more important than the last three deploy logs? Is the team slack channel more relevant than the architecture doc? Depends entirely on the task. RAG can't answer that without explicit context about what the agent is trying to do.

Second: Engineering context requires live, streaming updates.

RAG assumes relatively static knowledge. You can batch-index documents once, then answer questions about them. That works fine for customer support.

But engineering is the opposite of static. Code changes multiple times a day. Deploys happen continuously. Incidents spike and resolve in minutes. A new error appears in your logs that didn't exist this morning. A team member goes on-call and is now the expert on the payment service.

If your agent's context is a snapshot from yesterday's batch indexing job, it's already stale. And for incident response, being 12 hours behind is useless.

Third: Engineering decisions are not just about documents—they're about state.

RAG is excellent at questions like: "What does the documentation say about X?" But engineering work requires different questions:

"What are we actually doing?" (not what docs say we should do)
"What changed recently?"
"What's broken right now?"
"Who owns this?"
"What was the decision?"

These questions need answers that come from live system state, not from static documents. They need to know the current git branch, the active deploys, the error rates, the on-call schedule, the ticket status.

Documents tell you what the system should do. Context tells you what it is actually doing.

A RAG system might retrieve a document that says "payments are idempotent" (good document). But the agent also needs to know: "A payment idempotency bug was found 3 days ago and we're waiting for review on the fix" (live context). The document alone would give you bad advice.

The Five Types of Context Engineering Agents Need

Effective context engineering assembles five types of context. Each answers different questions. Each comes from different sources. Each needs to be kept fresh.

1. Code Context

What is the system's actual architecture? What depends on what? Who built it?

Code context includes:

Codebase topology: Modules, services, libraries, their relationships and dependencies
Code ownership: Which team or person is responsible for each component
Architecture and patterns: How is the system organized? What frameworks are in use?
Quality signals: Test coverage, complexity metrics, age of the code
Change history: Recent commits, who changed what, when

An agent without code context might suggest a refactoring that would break three downstream services. With code context, it knows the dependency graph and can avoid that mistake.

Example: An agent is asked to optimize a slow API endpoint. Without code context, it might suggest adding a cache. With code context, it knows that Redis is down, that there's already a memcached layer that's saturated, and that the real bottleneck is an unindexed database query in a table owned by the data team (who are traveling).

Sources: Git repos, dependency analysis tools, static analysis, CODEOWNERS files, architectural docs.

2. Runtime Context

What is happening right now? What deployed recently? What's broken?

Runtime context includes:

Deployment state: What version is running in production? When did it deploy? Who deployed it?
Error signals: Error rates, error types, error locations, which errors are new
Performance metrics: Latency, throughput, resource utilization, which systems are degraded
Incidents: Active incidents, recent incidents, incident timeline, impact scope
Infrastructure state: Which services are healthy? Which are degraded? What's scaling?

An agent without runtime context might suggest a feature that would make things worse during an incident. With runtime context, it knows the system is under load and can suggest stability-focused actions instead.

Example: An agent is asked to help debug a slow user experience. Without runtime context, it's guessing. With runtime context, it sees that latency spiked in the auth service starting 15 minutes ago, that five error types are correlated with the spike, and that the most recent deploy was 12 minutes ago.

Sources: Monitoring systems (Datadog, New Relic), deployment systems (CI/CD logs), incident tracking (PagerDuty), observability platforms, log aggregation.

3. Process Context

What are we working on? What are the priorities? What's blocking us?

Process context includes:

Sprint goals: What are we trying to ship this sprint?
Work in progress: What issues are assigned? What's in review? What's blocked?
Priority and urgency: What matters most? What can wait?
Decisions and RFCs: What have we decided? What trade-offs did we make?
Blockers and dependencies: What's slowing us down? Who are we waiting on?

An agent without process context might suggest optimizing something that's not on the sprint plan, or refactor code that's about to be replaced. With process context, it knows what matters this week.

Example: An agent is asked for improvement ideas for the user onboarding flow. Without process context, it might suggest a rewrite. With process context, it knows that onboarding is scheduled for a complete redesign next quarter, so the real value is in small, low-risk improvements that don't conflict with the planned work.

Sources: Project management tools (Jira, Linear), sprint planning docs, RFC docs, Slack channels, team meetings.

4. Historical Context

What happened before? What did we try? What did we learn?

Historical context includes:

Past incidents: What broke? How was it fixed? Why did it happen?
Past decisions: What have we chosen? What trade-offs did we make? Why?
Patterns: What types of problems do we keep hitting? What works for us?
Failures and learnings: What didn't work? Why did it fail? What did we learn?
System evolution: How has the architecture changed? Why?

An agent without historical context might suggest a solution that was tried before and failed. With historical context, it learns from the team's experience.

Example: An agent is asked to improve database performance. Without historical context, it might suggest switching databases. With historical context, it knows the team tried that three years ago, it was a nightmare, and they decided to optimize the current database instead. It knows which query patterns are expensive because they've been burned by them before.

Sources: Incident post-mortems, past RFCs, git history with commit messages, architectural decision records, Slack history, team retrospectives.

5. Team Context

Who knows what? Who's available? Who's responsible?

Team context includes:

Expertise map: Who's the expert on payment systems? On the frontend? On infrastructure?
Availability: Who's on-call? Who's in a meeting? Who's on vacation?
Ownership: Which team owns which systems? Who's the point person?
Team goals: What is this team trying to achieve?
Working agreements: How does this team operate? What are the norms?

An agent without team context might ask the wrong person for review, or suggest a solution that conflicts with another team's priorities. With team context, it can coordinate across the org.

Example: An agent is asked to prioritize a refactoring task. Without team context, it might prioritize based on technical merit. With team context, it knows that the backend team is at capacity for the next two weeks, so even a high-impact refactoring should wait. It also knows that Sarah just joined and this would be a great onboarding task for her.

Sources: Org charts, team rosters, on-call schedules, Slack profiles, past collaboration patterns, team wikis.

Building a Context Layer for Engineering Agents

In theory, context engineering is simple: collect all the right information, keep it fresh, and present it to the agent.

In practice, it's hard. Here's what you need to build:

Data Integration

Your context has to come from somewhere. And it lives everywhere.

Start by mapping where each type of context lives:

Code context: GitHub API, static analysis tools, your monorepo
Runtime context: Datadog, your CI/CD logs, Kubernetes API, PagerDuty
Process context: Jira API, your RFC docs, Slack API
Historical context: Git history, incident databases, decision logs
Team context: Okta, Slack, PagerDuty on-call API

You need to build connectors to each of these systems. Some are easy (REST APIs). Some are hard (Slack requires OAuth, the Slack API rate-limits aggressively, historical search is expensive). Some require custom integrations (your homegrown runbook system that nobody uses anymore but has critical information).

This is not a one-time effort. As your stack evolves, your context layer has to evolve too.

Unified Modeling

Once you've collected context, you need to structure it into a unified product model. This is a normalized representation of your whole engineering system.

An agent should be able to ask questions like:

"Show me all the errors in the payment service from the last hour"
"Who owns the auth service and what was deployed there in the last 24 hours?"
"What's blocking the checkout feature?"

These questions require joining what I've seen multiple sources. Without a unified model, you're asking the agent to do the joins itself—and agents are bad at that.

A unified model includes:

Services/components: Name, owner, language, framework, dependencies
Deployments: Service, version, time, who deployed, outcome
Errors: Service, type, count, time, affected users
Incidents: Title, services involved, timeline, resolution, learnings
Work items: Title, assignee, status, blocked-by relationships, sprint
Team members: Name, expertise, on-call schedule, availability

This model lives in a database (often a graph database or a purpose-built product). It gets continuously updated from your source systems.

Freshness and Consistency

Context is only useful if it's current. A deploy that happened 2 hours ago is stale context for incident response.

You need a strategy for freshness:

Real-time: Some context (active incidents, current deployments, live error rates) should be fetched on-demand, fresh each time
Near-real-time: Some context (recent deploys from the last 24 hours, current sprint work) can be cached for 5-15 minutes
Batch updates: Some context (code architecture, historical patterns) can be updated once a day

Don't try to make everything real-time. Real-time is expensive and often unnecessary. Make the high-value, high-change context real-time, and cache the rest.

Also, maintain consistency. If your unified model says Sarah owns the payment service, but your on-call system says Marcus is on-call for it, that's a bug. Build validation into your context layer.

Relevance and Ranking

Not all context is equally relevant. If you dump the entire context layer into a prompt, you'll overwhelm the LLM and get worse answers.

You need ranking and filtering:

Task-based filtering: Different tasks need different context. A task to "fix a performance issue" doesn't need the complete git history of the payment service—it needs recent deploys, current metrics, and recent changes
Temporal ranking: Recent information is usually more relevant than old information
Specificity ranking: Specific information (your error logs) is usually more relevant than generic information (a best practices doc)
Ownership ranking: Information from the team that owns the system is usually more relevant than information from other teams

You'll need custom logic (probably rules-based, possibly ML) to decide what context to include for each request.

Context Engineering in Practice

Here's what context engineering looks like in the real world.

Bad Context Assembly

Scenario: Your agent is asked to help debug a slow API.

Bad context:

Here's documentation on REST API best practices. Here are all 400 lines
of your API code. Here's a tutorial on optimizing database queries.
Here's your team roster.

Why it's bad: The agent is drowning in irrelevant information. It doesn't know what the actual problem is. It's guessing.

Good Context Assembly

Scenario: Same question—help debug a slow API.

Good context:

INCIDENT: User onboarding API latency spike

Active Now:
- Service: user-onboarding-api (owned by: Growth Team)
- Latency: 2.3s (normal: 350ms) for last 15 minutes
- Error rate: 0.2% (normal: 0.01%)

Recent Changes (last 24h):
- Deploy: 2026-03-05 14:23 UTC by alice@company.com
  Commit: "Add user analytics tracking"
  Lines changed: +45 in analytics.js

Current Metrics:
- P99 latency: 2.8s
- Errors: mostly timeout errors in the analytics service
- Database query time: 1.8s (slowest query: INSERT INTO user_events)

Code Context:
- analytics.js synchronously calls analytics service
- analytics service has no timeout (blocking)
- This is a known bottleneck (see RFC #203)

Team Context:
- Growth Team (Javi, Marcus) owns this service
- Javi is on-call and available
- Analytics service owned by Data Team (Marcus, unavailable until tomorrow)

Similar Incident (6 months ago):
- Analytics service rate limiting caused same latency spike
- Fix: implement async analytics calls (in progress for 3 months)

Why it's good: The agent knows the actual problem (one specific change, one specific service), knows who to talk to, knows the history, and knows what the actual architecture looks like. It can give focused, actionable advice.

The difference is night and day. The second version takes more engineering effort—but the output is 10x more useful.

Another Example: Code Review

Bad context:

Here's a 300-line diff. Here's documentation on code review best practices.

Good context:

PULL REQUEST: "Add caching to user profile queries"

Code Context:
- File: src/services/userService.ts (owned by Platform Team)
- Depends on: cacheLayer (owned by Infra Team), userRepository (owns 15 services)
- Impacted services: 3 (user-onboarding, user-profile, user-settings)

Change Summary:
- Adds in-memory cache with 5-minute TTL
- Affects userRepository.findById() which is called 50k times/day

Risk Assessment:
- Cache invalidation: PR doesn't include invalidation on user updates
- Similar issue found in: payment service (2 months ago, took 3 hours to debug)

Ownership:
- PR author: alice (Platform Team, 2 years on codebase)
- Reviewer assigned: bob (Platform Team, 4 years, owned cache layer)
- But: userRepository affects Data Team (might need alignment)

Related Work:
- Similar caching effort (user sessions): merged 2 weeks ago
- Caching strategy RFC (RFC #150): decided on 5-min TTL for read-heavy queries

Historical Context:
- Cache bugs in this service: 3 in the last year (all invalidation-related)
- Bob recommended: "always pair cache invalidation review with an e2e test"

Again, the difference is in actionability. With good context, the agent knows what to focus on, what the risks are, and what expertise to apply.

FAQ

Q: Isn't building a context layer a lot of engineering work?

A: Yes. It is. But consider the alternative: your AI agents make bad decisions because they lack context, and your team doesn't trust them. That's also expensive—it's just spread across many small decisions instead of one upfront investment. A solid knowledge management system and codebase intelligence layer pays for itself within months.

Q: Can't I just use vector search over everything?

A: Vector search (RAG) is great for finding documents that are semantically similar to a query. It's terrible for finding the right context for a task. "Give me all documentation about caching" is a great vector search query. "Give me the context I need to review this specific PR" is not. You'll need both—vector search for exploratory questions, and structured context for task-specific work.

Q: How much context should I include in each request?

A: This is the hard part. Empirically, somewhere between 2,000 and 8,000 tokens of context works best for most engineering tasks. More than that and you get diminishing returns. Less than that and you're probably missing something important. Start with a target of 4,000 tokens and iterate based on the quality of the agent's output.

Q: How often do I need to refresh the context?

A: Depends on the type. Deployment state, active incidents, and error rates should be fetched on-demand (sub-second). Work-in-progress status can be cached for 5-15 minutes. Code ownership and architecture can be cached for hours. Historical context (past incidents, decisions) rarely changes. You'll need different refresh strategies for different types of context. Don't try to make everything real-time.