Context Compaction: Engineering Better LLM Conversations

You’re deep in a coding session with Claude, GPT, or your model of choice. Three hours in, you notice the responses getting off. Suggestions that don’t quite fit. Code that contradicts earlier decisions. Solutions that ignore context you established ages ago.

Sounds like you have hit the context wall.

The quality of your AI outputs degrades long before you run out of tokens. Understanding when and how to compact your context isn’t just an optimization. It is the difference between an AI that helps you ship and one that wastes your afternoon.

The Context Window Reality

Modern LLMs operate with massive context windows. Claude offers 200K tokens. GPT-4 provides 128K. These numbers sound infinite when you’re starting fresh, but here’s the truth most developers learn the hard way:

Around 40% context utilization, you’ll start seeing degradation.

Not failures. Not errors. Just subtle drift. The model starts forgetting earlier decisions. It suggests code patterns it already rejected. It loses the thread of your architectural choices.

This isn’t a bug—it’s an inherent limitation of attention mechanisms across long sequences. The model is trying to maintain coherence across thousands of lines of conversation, and some details inevitably fade into noise.

The Naive Approach

Most people interact with LLMs the way they’d work with a junior developer on chat:

Ask for something
Get a result that’s wrong
Explain why it’s wrong
Steer toward what you actually want
Repeat until it works

This cycle is fine for simple tasks. But for complex development work, it’s a context killer. Each correction adds more tokens. Each “no, not like that” adds more confusion. Soon you’re spending more time correcting than building.

The model isn’t learning from your corrections, its actually confused by them.

Three Context Management Strategies

1. Context Refresh: The Quick Reset

You’re working on a feature. The conversation drifts off topic, maybe you explored a dead end approach, maybe you got sidetracked debugging something unrelated. The solution isn’t resteering the conversation. It’s a clean restart.

When to refresh:

You’ve gone down a rabbit hole that didn’t pan out
The conversation has too many failed attempts cluttering the history
You need to pivot direction entirely

How to do it: Start a new conversation. Provide only what matters: the current working state, the goal, and maybe one sentence about what didn’t work. Don’t drag in the entire exploratory journey.

Example refresh prompt:

I'm building a REST API for [purpose]. Current state: [working code path].
Need to add [specific feature]. Previous approach using [method] had issues with [brief issue].

Now we have a clean slate with minimal baggage.

2. Intentional Compaction: Checkpointing Your Progress

This is the power move. Whether your conversation is going well or poorly, at regular intervals you deliberately compress your context into a portable artifact.

When to compact:

Every major milestone (feature complete, tests passing, architecture decided)
Before switching focus areas
When you hit that 40% context threshold feeling
End of a work session

What goes into a compaction:

The compaction document becomes your save point. Think of it as a sophisticated README that the AI can load to understand exactly where you are.

Structure your compaction:

# Project: [Name]
 
## Current Objective
[What you're building right now]
 
## Working Implementation
Path: /src/main/feature.js
- Core logic implemented
- Handles edge cases: X, Y, Z
- Integration points: A, B
- Known working since: [timestamp/commit]
 
## Attempted Approaches (Non-Working)
Path: /archive/feature-attempt-1.js
- Tried: [approach]
- Issue: [specific problem]
- Why abandoned: [reason]
- Don't revisit: [antipatterns learned]
 
## Key Decisions & Constraints
- Using [library] because [reason]
- Avoiding [approach] due to [constraint]
- Architecture choice: [decision] over [alternative]
 
## Current Blockers/Questions
1. [Specific issue]
2. [Open question]
 
## Next Steps
- [ ] [Concrete action]
- [ ] [Concrete action]

Pass this document to a fresh AI conversation and it’ll be up to speed in one exchange instead of thirty.

3. Sub-Agents: Delegating Without Bloating

Another technique that is ofter overlooked is not every question needs to happen in your main context window.

You’re building a feature and need to understand how a library works. Don’t ask in your primary conversation. It will stuff dozens of messages about API documentation into your precious context. Instead, open a separate conversation, get your answer, bring back only the essential information.

Use sub-agents for:

Library research and API exploration
Debugging discrete issues
Exploring alternative approaches
Generating test data or fixtures
Refactoring isolated components

Your main conversation stays focused on the through-line of your work. Sub-agents handle the tangents.

What Actually Takes Up Space

Not all context is created equal. Understanding what bloats your window helps you make better decisions:

High cost:

Full code files (especially when repeated multiple times)
Error outputs (stack traces)
API documentation dumps
Back-and-forth debugging cycles
Exploratory dead-ends you didn’t clean up

Low cost:

File paths and structure
Brief summaries of what code does
Decision records
Clear, specific questions

Pro tip: Don’t paste entire files unless absolutely necessary. Reference them by path and describe their purpose. The AI can ask for specific sections if needed.

Context Engineering as a Practice

The best AI-assisted developers aren’t the ones who know the cleverest prompts. They’re the ones who manage context like it’s their most valuable resource—because it is.

Before you hit send on a message, ask:

Does this information move us forward?
Have I already said this in different words?
Am I adding correction on top of correction instead of restarting clean?
Would a fresh conversation with a compaction be faster?

The Practical Workflow

Here’s what effective context management looks like in practice:

Hour 1: Fresh start, clear goal, building momentum. Everything fits in working memory.

Hour 2: You’ve generated code, had a few corrections, made decisions. Still good, but you’re feeling the accumulation.

Hour 3: Time for intentional compaction. Export your progress to markdown. Document decisions. Start a fresh conversation with the compaction loaded.

Hour 4: New context, same progress. You’re building on solid ground again instead of navigating around conversational debris.

This process of build, compact, refresh, repeat, keeps you in the high quality zone where AI assistance actually multiplies your productivity instead of adding friction.

The Bottom Line

Context windows are large, but they’re not infinite in practice. Quality degrades long before quantity runs out.

Treat your context like RAM: keep it clean, compress regularly, and don’t be afraid to restart when things get messy. The best developers using AI aren’t having long conversations, they’re having focused 30-minute sessions with strategic reset points.

Your AI assistant is powerful, but it’s also stateless and attention-limited. Feed it well, keep it focused, and compact often. That’s how you maintain excellent planning and excellent implementations from first prompt to final deployment.

Managing context well is managing your own productivity. Master the compact, refresh, and restart cycle, and you’ll get more done with AI in an afternoon than most developers manage in a week.

Jones Codes

Explorer

Context Compaction: Engineering Better LLM Conversations

The Context Window Reality

The Naive Approach

Three Context Management Strategies

1. Context Refresh: The Quick Reset

2. Intentional Compaction: Checkpointing Your Progress

3. Sub-Agents: Delegating Without Bloating

What Actually Takes Up Space

Context Engineering as a Practice

The Practical Workflow

The Bottom Line

Graph View

Recent Posts

Multi Agent Cookbook

Review Before You Push: Why Local Code Review Matters in an Agentic Workflow

Posts

Building sensei-eval

Building Sensei