You’re deep in a coding session with Claude, GPT, or your model of choice. Three hours in, you notice the responses getting off. Suggestions that don’t quite fit. Code that contradicts earlier decisions. Solutions that ignore context you established ages ago.
Sounds like you have hit the context wall.
The quality of your AI outputs degrades long before you run out of tokens. Understanding when and how to compact your context isn’t just an optimization. It is the difference between an AI that helps you ship and one that wastes your afternoon.
The Context Window Reality
Modern LLMs operate with massive context windows. Claude offers 200K tokens. GPT-4 provides 128K. These numbers sound infinite when you’re starting fresh, but here’s the truth most developers learn the hard way:
Around 40% context utilization, you’ll start seeing degradation.
Not failures. Not errors. Just subtle drift. The model starts forgetting earlier decisions. It suggests code patterns it already rejected. It loses the thread of your architectural choices.
This isn’t a bug. It’s an inherent limitation of attention mechanisms across long sequences. The model is trying to maintain coherence across thousands of lines of conversation, and some details inevitably fade into noise.
The Naive Approach
Most people interact with LLMs the way they’d work with a junior developer on chat:
- Ask for something
- Get a result that’s wrong
- Explain why it’s wrong
- Steer toward what you actually want
- Repeat until it works
This cycle is fine for simple tasks. But for complex development work, it’s a context killer. Each correction adds more tokens. Each “no, not like that” adds more confusion. Soon you’re spending more time correcting than building.
The model isn’t learning from your corrections, its actually confused by them.
Three Context Management Strategies
1. Context Refresh: The Quick Reset
You’re working on a feature. The conversation drifts off topic, maybe you explored a dead end approach, maybe you got sidetracked debugging something unrelated. The solution isn’t resteering the conversation. It’s a clean restart.
When to refresh:
- You’ve gone down a rabbit hole that didn’t pan out
- The conversation has too many failed attempts cluttering the history
- You need to pivot direction entirely
How to do it: Start a new conversation. Provide only what matters: the current working state, the goal, and maybe one sentence about what didn’t work. Don’t drag in the entire exploratory journey.
Example refresh prompt:
I'm building a REST API for [purpose]. Current state: [working code path].
Need to add [specific feature]. Previous approach using [method] had issues with [brief issue].
Now we have a clean slate with minimal baggage.
2. Intentional Compaction: Checkpointing Your Progress
This is the power move. Whether your conversation is going well or poorly, at regular intervals you deliberately compress your context into a portable artifact.
When to compact:
- Every major milestone (feature complete, tests passing, architecture decided)
- Before switching focus areas
- When you hit that 40% context threshold feeling
- End of a work session
What goes into a compaction:
The compaction document becomes your save point. Think of it as a sophisticated README that the AI can load to understand exactly where you are.
Structure your compaction:
# Project: [Name]
## Current Objective
[What you're building right now]
## Working Implementation
Path: /src/main/feature.js
- Core logic implemented
- Handles edge cases: X, Y, Z
- Integration points: A, B
- Known working since: [timestamp/commit]
## Attempted Approaches (Not Working)
Path: /archive/feature-attempt-1.js
- Tried: [approach]
- Issue: [specific problem]
- Why abandoned: [reason]
- Don't revisit: [antipatterns learned]
## Key Decisions & Constraints
- Using [library] because [reason]
- Avoiding [approach] due to [constraint]
- Architecture choice: [decision] over [alternative]
## Current Blockers/Questions
1. [Specific issue]
2. [Open question]
## Next Steps
- [ ] [Concrete action]
- [ ] [Concrete action]Pass this document to a fresh AI conversation and it’ll be up to speed in one exchange instead of thirty.
3. Sub Agents: Delegating Without Bloating
Another technique that is ofter overlooked is not every question needs to happen in your main context window.
You’re building a feature and need to understand how a library works. Don’t ask in your primary conversation. It will stuff dozens of messages about API documentation into your precious context. Instead, open a separate conversation, get your answer, bring back only the essential information.
Use sub agents for:
- Library research and API exploration
- Debugging discrete issues
- Exploring alternative approaches
- Generating test data or fixtures
- Refactoring isolated components
Your main conversation stays focused on the main theme of your work. Sub agents handle the tangents.
What Actually Takes Up Space
Not all context is created equal. Understanding what bloats your window helps you make better decisions:
High cost:
- Full code files (especially when repeated multiple times)
- Error outputs (stack traces)
- API documentation dumps
- Back and forth debugging cycles
- Exploratory dead ends you didn’t clean up
Low cost:
- File paths and structure
- Brief summaries of what code does
- Decision records
- Clear, specific questions
Pro tip: Don’t paste entire files unless absolutely necessary. Reference them by path and describe their purpose. The AI can ask for specific sections if needed.
Context Engineering as a Practice
The best AI assisted developers aren’t the ones who know the cleverest prompts. They’re the ones who manage context like it’s their most valuable resource—because it is.
Before you hit send on a message, ask:
- Does this information move us forward?
- Have I already said this in different words?
- Am I adding correction on top of correction instead of restarting clean?
- Would a fresh conversation with a compaction be faster?
The Practical Workflow
Here’s what effective context management looks like in practice:
Hour 1: Fresh start, clear goal, building momentum. Everything fits in working memory.
Hour 2: You’ve generated code, had a few corrections, made decisions. Still good, but you’re feeling the accumulation.
Hour 3: Time for intentional compaction. Export your progress to markdown. Document decisions. Start a fresh conversation with the compaction loaded.
Hour 4: New context, same progress. You’re building on solid ground again instead of navigating around conversational debris.
This process of build, compact, refresh, repeat, keeps you in the high quality zone where AI assistance actually multiplies your productivity instead of adding friction.
The Bottom Line
Context windows are large, but they’re not infinite in practice. Quality degrades long before quantity runs out.
Treat your context like RAM: keep it clean, compress regularly, and don’t be afraid to restart when things get messy. The best developers using AI aren’t having long conversations, they’re having focused 30-minute sessions with strategic reset points.
Your AI assistant is powerful, but it’s also stateless and attention limited. Feed it well, keep it focused, and compact often. That’s how you maintain excellent planning and excellent implementations from first prompt to final deployment.
Managing context well is managing your own productivity. Master the compact, refresh, and restart cycle, and you’ll get more done with AI in an afternoon than most developers manage in a week.