I’ve been building MCP tools for a few months now, and I kept running into the same problem from different angles. In new sessions, the agent doesn’t know enough. It doesn’t know your project’s conventions. It doesn’t know which files are related to the one you’re changing. It doesn’t know that your team uses barrel exports or that your test files follow a specific naming pattern. Every session we spend tokens to acquire context.

The usual fix is static instructions. Drop a skill file, beef up your CLAUDE.md, hope the agent reads it and retains enough to be useful. But static text files are the wrong abstraction for dynamic project context. The context changes with every commit. It’s specific to each repo, each team. No amount of upfront configuration can capture that. What’s missing is a context layer that any tool can query on demand.

The Pattern

The architecture that keeps emerging is three layers working together.

A structured database stores deterministic facts about your codebase. File relationships, import graphs, framework detection, naming conventions. Things that have definitive answers and can be queried precisely.

A vector database sits alongside it for fuzzy retrieval. When the agent needs to find “files related to authentication” or “code that handles error boundaries,” it can search by meaning rather than by exact path or keyword.

An MCP tool interface ties both layers together and exposes them to the agent as callable tools. The agent doesn’t need to know how the data is stored or indexed. It just calls get_conventions or search_codebase and gets back what it needs.

In practice this looks like: the agent is reviewing a PR that touches auth/middleware.ts. It calls get_related_files and discovers the test file, the route config, and three other middleware files that follow the same pattern. It calls get_conventions and learns the project uses a specific error handling style across all middleware. It didn’t need instructions to find any of that. It queried for it.

MCP is what makes this work as an interface. It gives agents a standardized way to pull context mid-conversation, changing the dynamic from “front-load everything into a prompt” to “retrieve what you need when you need it.”

Speed Has to Be Invisible

This pattern only works if retrieval is fast enough that nobody notices it happening.

The vector search itself is the easy part. Pinecone benchmarks show sub-2ms query latency at 95% recall for moderate datasets (VectorView). Cloudflare Vectorize brought median query latency down from 549ms to 31ms when it went GA (Cloudflare blog). At repo scale, vector search is effectively free.

The real bottleneck is the embedding step. If you’re hitting an external API, you’re adding latency you can’t control. Milvus’s testing found that embedding API calls take hundreds of milliseconds to several seconds, while actual database operations add only 20-40ms (Milvus blog). Nixiesearch’s benchmark of major providers showed volatile latency with unpredictable spikes (Nixiesearch).

Running embeddings locally changes the equation. With QInt8 quantization, a 500M parameter model on CPU through ONNX runs in roughly 10ms per query (Nixiesearch). That’s an order of magnitude faster than any API call and eliminates the network dependency entirely.

Layer caching on top and it gets even better. Project architecture and conventions don’t change between commits. Cache those responses with a reasonable TTL and the most common context queries never touch the vector layer at all. Worst-case round trip is under 100ms. In a Claude Code session where the model takes seconds to respond, that’s invisible.

Keeping the Index Fresh

The “index once, query from anywhere” promise only holds if the index stays current. Stale context is worse than no context because the agent trusts what it gets back.

The approach I’m using is incremental re-indexing through SHA diffing. The indexer stores the tree SHA and individual file SHAs from the last run. On the next index, it compares against the current state and only reprocesses files that changed. For a typical development session where you’ve touched a handful of files, re-indexing takes seconds.

The trigger matters too. Right now it’s manual (run a CLI command after a batch of changes), but the natural evolution is git hooks or filesystem watchers that kick off re-indexing automatically on commit. For CI/CD environments, a webhook-triggered re-index on push keeps the production context layer current without any developer intervention. The KV cache layer has a TTL that handles the gap between index runs, so queries don’t serve truly stale data even if re-indexing hasn’t fired yet.

A Generic Context Layer for Any Agent Tool

This pattern isn’t specific to code review or any single tool. Every agent tool that operates on a codebase would benefit from the same context layer. A test generator that knows your conventions. A documentation tool that understands your architecture. A migration assistant that knows your dependency graph.

Right now, every tool builder solves this independently or doesn’t solve it at all. Most tools ship context-blind and rely on the user to fill the gap through prompting.

What if the context layer was a shared service? One MCP server that indexes your repo and serves project understanding to any connected tool. Index once, query from anywhere. The code review tool, the refactoring tool, the test generator all draw from the same well of project knowledge.

That’s what I’m building with repo-context-service. It started as a way to give DiffPrism deeper project awareness during code review. But the more I build with it, the more obvious it becomes that this is infrastructure, not a feature. The agent tools of the future won’t just do things. They’ll know things.