How a Security Audit Taught Me to Budget AI Costs in Code (Not Prompts)

Four targeted fixes confirmed by the advisor. Applying them now:

TLDR: An insight-regeneration cooldown and a row-count cap (returning 413) stopped a potential runaway bill cold. Add both before any AI feature ships — in code, not in the prompt.

the audit

We run a periodic security sweep I call the CSO pass — a /cso skill that walks the full product fleet looking for auth gaps, exposed routes, and anything that would make a security person wince.

This one covered my main client, an ecommerce business — six apps across Vercel.

I expected the usual. Missing requireAuth() wrappers. An unguarded server action or two.

What I didn't expect to find was the AI cost exposure.

what we left wide open

A webinar analytics dashboard has an AI insight regeneration feature.

It reads webinar attendance data, runs it through Claude, and generates analytics commentary right in the dashboard.

It worked beautifully.

There was also absolutely nothing stopping someone from clicking "Regenerate Insights" fifty times in a row.

Every click is a metered API call. They stack FAST.

A lead importer tool was worse — it accepts a CSV upload of webinar attendees and runs each row through an LLM + CRM pipeline. No row cap. No upload limit. A 10,000-row file would just... go.

After Anthropic's June 2026 billing change moved API usage off flat Max plan credits and onto metered per-call pricing, that stopped being a theoretical concern and became a real one.

what doesn't work

My first instinct? Prompt-layer fixes.

Batch carefully. Tell the model to be conservative. Add a system-prompt note about cost.

I'll be honest — that instinct is completely wrong.

A user hitting "Regenerate" doesn't know or care what's in your system prompt. The call fires. The meter runs. The prompt shapes the output. It does not cap the invocations.

the fix that worked

Two controls. Both in code, before the model is ever reached.

Insight regeneration cooldown — a server-side timestamp gate. If insights were generated in the last N minutes, the endpoint returns early with no model call. Zero cost, no user drama. (CSO F6 on the webinar analytics dashboard.)
Row count cap + body size limit — any upload over the row ceiling gets a 413 Payload Too Large response before a single row touches the pipeline. We also dropped bodySizeLimit from 100mb to 25mb. The lead importer's commit message actually says it plainly: "cap upload size and row count to bound LLM/CRM cost." (CSO F5 and P2.)

Neither fix is clever. Both are completely obvious in hindsight.

That's usually how it goes.

why this matters

Every AI feature I ship now gets two questions before it goes live:

What happens if someone hammers this endpoint for ten minutes?
What's the maximum possible input size — and is there a hard gate before the model ever sees it?

If I can't answer both, it's not ready.

The prompt is your intent. The cap is your budget. You need both.

An open AI surface isn't a bug in the traditional sense — your tests will pass, your users won't complain. But it's a bill waiting to happen, and the meter doesn't care about your good intentions.

Cap it in code. Do it before you ship.