Last time I found a burst that ran up my coaching agent’s Claude bill and couldn’t stop it. I ended that post with a confession. The real fix was a ceiling on what one account could spend, and I hadn’t written it yet. So this is me writing it.
I didn’t write the fix I promised, though. I wrote a better one.
The Hole, in Dollars
First I made the old finding concrete. The coach loop runs on a paid model, up to eight calls a turn. So I minted one throwaway account and fired twenty chat turns at it all at once.
All twenty ran. None got throttled. The burst spent about seventy five cents in fourteen seconds. The free budget for a whole month is a quarter. So that’s one account, one burst, three months of budget gone in the time it takes to read this paragraph. Run it back to back and there’s no ceiling at all.
I could put a number on it because the server writes down what every run cost. I read the bill, not the reply.
Why All Twenty Got Through
There’s a quota. Every account gets a monthly budget and a check that runs before the coach spends anything. It reads how much you’ve used this month and stops you if you’re over.
But the order matters. It reads how much you’ve used, then it spends. Send one request and that’s fine. Send twenty at once and all twenty read the same number, the one from before any of them ran. They all see room. They all go. By the time the usage actually lands, the money’s already spent.
The check was written for one request at a time. And a burst is twenty requests all pretending to be the first one. The rate limiter had the same blind spot. Two gates, and neither one could see the other requests in the burst.
The Lock I Already Had
The fix I promised last time was a spending ceiling inside the loop. It’s real, but it’s a big change, and honestly it still wouldn’t stop twenty requests from racing to the gate together.
The thing that stops a race is a lock. And I already had one.
Every user in this app gets their own little object on the server. It holds their data and it runs one thing at a time. That single-file property is the whole reason the app’s safe in a few other ways, and I’d never once used it here.
So I moved the gate into the object. Before a coach turn spends anything, it asks that object for a slot. The object counts how many turns are already running for this user and hands out at most three. The fourth one waits, or gets turned away. And because the object only does one thing at a time, the twenty requests can’t all read the same number anymore. They’ve got to line up. Three go, the rest bounce.
Three isn’t a real limit for a person. Nobody sends four coach messages in the same instant. It’s only a limit for a burst.
I deployed it and ran the same twenty. Fifteen came back rejected. The burst’s bill dropped from about seventy five cents to sixteen. And the part I care about more than the burst: the monthly cap actually works now. It could never hold before, because a burst could blow past it by any width it wanted. Now the most that gets through at once is three, so the cap’s a cap again.
What It Still Doesn’t Do
The ceiling I promised last time, the one inside the loop that cuts off a single runaway turn, I still haven’t written. The concurrency cap makes it matter less. It doesn’t replace it. This is one layer, and I can prove it holds. I don’t get to call it done.
There’s also the obvious way around all of it. Just make more accounts. Registration’s only limited per address, so a wide enough net still spends real money. That one isn’t a bug I patch in an afternoon. That’s the free tier, and the answer to it is a card on file, not a cleverer gate.
The Pattern, Again
This is the same agent I keep taking apart in public, and the lesson rhymes with all the others. The gate I wrote into the request path turned into a suggestion the second twenty requests showed up together. The gate that held was the one built into the structure, the object that can only do one thing at a time. I didn’t add a rule. I leaned on something the system already had.
I went in to write a spending ceiling. I came out having capped concurrency instead, because the burst was a race and the cure for a race is a lock. Turns out the best fix was one the architecture had been holding out the whole time.
Claude ran the staging burst and read the cost traces with me. The calls about what the numbers meant, I checked myself.