One way into my coaching agent had no rate limit on it. The in-workout builder. It runs the same loop as the chat box, up to eight model calls a turn, and nothing capped how fast you could hit it. The chat box had a limiter. The builder was one line away from the same one and did not have it. Easy exploit. Hammer the unguarded route, run up the Claude bill.

Then I ran it, and the route that did have a limiter did not stop me either.

What I Expected

Three routes reach the coach loop. Two call a rate limiter before they spend anything. The builder skips it. Same expensive loop behind all three.

So I expected a clean split. Burst the chat route and it caps me at fifteen a minute. Burst the builder and it runs every one. One guarded, one open. A tidy before and after.

I wrote a small script. Twenty requests at each, as fast as they would go.

What Happened

Both routes answered all twenty. Zero throttled. The guarded one did not guard.

Chasing the Wrong Guess

My first thought was that the limiter was failing open. The code does that on purpose. If the limiter is not wired up, it lets the request through rather than break the route. I had written that down as the likely hole before I started.

So I checked instead of trusting the guess. I watched the worker’s own logs while a request went through. A missing limiter logs a line that says so. The line never came. The limiter was wired up. The guess was wrong.

Then I pointed the same kind of burst at the login route. It has a tighter cap and costs nothing to hit. It threw the rate-limit error right away. So the limiter works. It was just not catching a burst on the coach.

A Rate Limit Is Not a Cost Limit

Two things were going on.

The limiter counts requests over a window. Throw them all at once and the count lags behind. A few slip past before it catches up. Even the tight login cap let a couple extra through. A burst is the exact shape that beats it.

The bigger one is simpler. The cap is fifteen a minute. Each of those fifteen runs a loop of up to eight model calls. That is a hundred and twenty model calls a minute, for one user, and the limiter is fine with all of it. It capped the number of requests. The number that costs me money is the number of model calls. Those are not the same number. A rate limit is not a cost limit.

The Fix, and the Fix I Didn’t Write

I added the missing guard. The builder now calls the same limiter as the chat box, before it spends anything. That closes the gap I came in for. All three routes match.

It does not close the real one. The same burst still slips past all three, just evenly now. The real fix is a ceiling on what a single request can spend, and a check inside the loop, so one burst or one heavy turn cannot run the bill up no matter which route it comes through. That is a bigger change. I have not written it yet.

OWASP names this one. In their top ten for agentic apps, bill spikes from a runaway agent fall under tool misuse, and the fix they list is a budget, not a rate. A ceiling on cost or tokens that cuts the agent off when it crosses the line. Same lesson, from a standards body instead of a staging burst.

What I did write, next to the guard, is a test. It fails if any paid route ever ships without the limiter again. The copy-paste gap that started this cannot come back quietly.

Where That Leaves Me

I went in to prove one route was unguarded. It was. I came out knowing the guard on the other routes is weaker than I had trusted. I would take that trade. The control I was about to lean on, I now know not to lean on.

This is the same agent I have been taking apart in public, one attack at a time. The last few rounds were about what the model believes and what it will obey. This one was about what it costs.

Claude ran the staging probes and read the worker logs with me. The calls about what the results meant, I checked myself.