TL;DR: If you have an AI chat endpoint and no body size cap, you have a denial-of-wallet hole. Cap it in your app — not your platform.
256 KiBis a reasonable ceiling.
The Setup
I've been building a practice management system for a law firm client, and it has an AI chat feature.
User types something. Route handler picks it up. Body goes straight into an LLM prompt.
Nothing unusual there.
What I Caught (Before It Caught Me)
I was doing a security sweep — tidying up a few rough edges on the chat route — when I noticed there was zero body size enforcement at the app layer.
No attacker needed to hit us. No bill spike. No incident.
Just me, staring at a route handler that would cheerfully accept… anything.
The Wrong Assumption
My first thought was: "doesn't the platform handle this?"
And it does — sort of.
Vercel hard-caps every serverless request body at 4.5MB before the request even reaches Next.js. No config change can raise it. It's a platform-level wall.
Here's the problem: 4.5MB is not protecting your budget. It's protecting your server.
Run the math.
My own empirical measurements say a megabyte of text is roughly 256K tokens. So Vercel's 4.5MB ceiling allows ~1.15 million tokens in a single request.
At today's LLM pricing, that one request could cost more than a thousand normal chat turns.
Flood the endpoint with a few concurrent requests and the numbers get ugly fast.
This is called cost-amplification DoS — and it's a DIFFERENT threat class from the usual "take down the server" attack. It doesn't crash anything. It just empties your account while the logs look perfectly normal.
The Fix
One change. App layer. Route handler.
Cap the request body at 256 KiB and return a 413 if anything larger comes in.
That's comfortably above any realistic chat message (a very long user message might hit 2–3 KiB), and it puts the ceiling at roughly 64K tokens max — still more than you ever want forwarding to an LLM from a chat input.
The platform cap is too generous by a factor of 18.
Why This Matters to Me
Every endpoint that forwards user input to a metered API is a wallet attack surface.
It's not just LLMs — same principle applies anywhere you pay per byte processed. But with AI, the amplification is ENORMOUS and the bill arrives quietly.
You're not going to get paged when your budget is draining. You're going to wake up to an invoice.
Cap the body. Size it to what a real user actually needs. Don't let the platform's ceiling become yours by default.
P.S. If you're on Next.js App Router, this is a two-minute fix. Read
Content-Length, check against your limit, return a413before you parse anything. Don't wait for a reason.