The Problem
Nobody loves code review. You open a PR, scan the diff, catch a typo, approve it, move on. The deeper stuff like pattern consistency and architectural alignment takes real time and real focus. Most of us don’t have enough of either.
I wanted to build something that catches pattern drift. Not a linter. Not static analysis. Something that understands how your codebase handles a problem and tells you when new code does it differently.
DiffPrism: Starting as an MCP Tool
The first version was a Claude Code MCP tool. MCP (Model Context Protocol) lets you build tools that AI agents can call. DiffPrism gave Claude Code a browser UI for reviewing diffs. You could open a review, see the changes, and Claude would post inline comments.
It worked. But it had too much friction. You had to be in a Claude Code session. You had to prompt it. You had to have the MCP server running. Every step was manual.
Real code review happens on GitHub. PRs live there. Discussions live there. Merges live there. A separate tool will always be an extra step.
Pivoting to a GitHub App
Meet developers where they already are. A GitHub App receives webhooks when things happen. PR opened, code pushed, comment posted. No manual triggering.
The new DiffPrism is a Cloudflare Worker that listens for GitHub webhooks. Someone comments /review on a PR. DiffPrism fetches the diff, gathers codebase context, sends everything to Claude, and posts inline comments on the PR. The developer never leaves GitHub.
Two Cloudflare Workers talk to each other through a service binding.
GitHub webhook
→ DiffPrism Worker (receives event, queues review job)
→ Queue consumer (fetches diff, gets context, calls Claude, posts comments)
→ Repo Context Service (semantic search, related files, conventions)
Service bindings let workers call each other directly without the public internet. The context service stays private but the GitHub App can still reach it.
The Repo Context Service
This is where I have been spending the most time. The context service indexes your entire repository and makes it searchable. This is what makes DiffPrism different.
Install DiffPrism on a repo and it triggers indexing. The service walks the repo tree, splits code into semantic chunks at function and class boundaries, generates vector embeddings with Workers AI, and stores everything in Cloudflare Vectorize. It also tracks imports and exports to build an import graph.
When a review happens, DiffPrism asks the context service two questions. What code is semantically similar to what changed? What files connect to the changed files through imports? These answers get fed to Claude alongside the diff. Claude can say “this error handling pattern differs from auth.ts” instead of just “consider adding error handling.”
The whole stack runs on Cloudflare. D1 for the database, R2 for storage, Vectorize for vector search, Workers AI for embeddings, Queues for async processing. $5/month covers everything at my current scale.
Improving Context Quality
I went through a few quality issues I had identified.
Combined query problem. DiffPrism was embedding the entire PR diff as a single search query. A diff that touches routing, auth, and tests produces a diluted embedding that matches nothing well. I added a batch search endpoint that accepts multiple queries. Each changed file gets its own focused embedding. Three queries across express.js returned six unique relevant results. One combined query returned noise.
Path only embeddings. The related files tool was embedding "File: src/router/index.js" as the vector query. Almost nothing to work with. I changed it to look up the file’s exports and first chunk of content. Related file scores jumped from 0.6 to 0.91. lib/request.js now finds lib/response.js as highly related. Obviously correct.
Missing conventions endpoint. DiffPrism could search code and find related files but could not ask about naming conventions, test patterns, or import style. I exposed the architecture and conventions tools as JSON API endpoints so the review prompt can include project standards.
Local indexing timeouts. Indexing a 200 file repo timed out because the entire pipeline ran synchronously. The embedding step alone takes minutes for hundreds of chunks. I split it into two phases. The HTTP request stores pre walked data in R2 and queues a job. The queue consumer reads from R2 and runs the pipeline async. Express.js (209 files, 308 chunks) indexed without issues.
Testing at Scale
I indexed express.js to validate things beyond small repos. 209 files, 308 chunks, all processed async through the queue.
Search for “middleware error handling” returned the error handling example file at 0.78 relevance. “Routing and route matching” returned router files. Architecture detection correctly identified JavaScript as the primary language, lib/ and test/ as the directory structure, index.js as the entry point, ESLint and Prettier as tooling. Conventions detected kebab case naming and CommonJS imports. All correct.
Where This is Going
DiffPrism reviews when you comment /review. Next steps are making it useful enough to charge for.
Authentication first. The context service is protected by a single shared secret. Real multi tenancy with per user access control needs to happen before anyone else uses this.
Then Stripe billing. Infrastructure costs are low enough that a modest subscription price has excellent margins. Free tier with limited reviews per month, pro tier for unlimited.
Automatic reviews on every PR. The GitHub App already receives push and PR events. Triggering on PR open or new commits is straightforward.
Better review quality. The model is Haiku right now for cost reasons while testing. Sonnet for production will produce better reviews. The context can get smarter too. It could look at recent commit history to understand what patterns are actively changing.
The end goal is a code review tool that knows your codebase well enough to catch what humans miss when reviewing at speed. Not replacing human review. Making it better.
Cheers!
Will