The Silent-Fail Trap: Why HTTP 200 Doesn't Mean Your LLM Did Anything

TLDR: An HTTP 200 from an LLM API tells you the request arrived. It says nothing about whether the model finished. Always check stop_reason and content length before trusting the response.

the setup

I've been building Apollo, my personal AI operating layer, and part of that is a subagent delegation system.

Big tasks go out to MiniMax M2.7 via api.minimax.io/anthropic (my default delegate), with GLM-5.1 via Z.AI as the fallback when MiniMax rate-limits or flakes.

Both expose Anthropic-compatible endpoints — same shape, familiar SDK, easy to wire up.

So I wired them up, shipped, and moved on.

the wall

A while later I'd notice: the delegate was called, I got a 200 back, Apollo reported "done"… and nothing happened.

No file changes. No output. The task just… evaporated.

My first instinct: retry. Same result. Retry again. Still nothing.

I was debugging the wrong layer entirely.

what was actually going on

The 200 was real. The task was a ghost.

Three specific failure shapes I eventually catalogued:

ZAI: returns 200 with stop_reason: "model_context_window_exceeded" and content[0].text completely empty. The model hit its context ceiling — and said so, quietly, inside a 200.
ZAI (again): returns 200 with content[0].text of zero length and no clear stop reason at all. Root cause still not fully characterized — possibly content-specific. I genuinely don't know why this one happens and I'm not pretending otherwise.
MiniMax: returns 200 where content has a thinking block but NO text block, and stop_reason: "max_tokens". Because I had set max_tokens absurdly low at first. The model used its whole budget just thinking and ran out before it could answer.

That last one I earned.

the fix that actually worked

Stop trusting the envelope. Verify the artifact.

I now run this checklist on every delegate call, no exceptions:

HTTP status is 200 ✓
stop_reason is end_turn — not model_context_window_exceeded, not max_tokens
content[].text is non-empty and non-trivial
If a file was supposed to change, verify the file actually changed
Stderr log at /tmp/<provider>-*.log is clean

If anything in that list fails: do not retry blind. Diagnose. The 200 is a red herring. The real signal is in stop_reason and content length.

why this matters to me

I'd been thinking about "did the API call succeed" when I should have been thinking about "did the task succeed."

Those are different questions. HTTP is a transport layer. It knows nothing about whether the LLM finished what you asked.

Every builder who delegates to an LLM subagent — whether it's MiniMax, ZAI, or any Anthropic-compatible endpoint — will hit this exact trap. The response looks fine. The job isn't done.

Check stop_reason. Check the content. Then trust the result.

P.S. The claude -p CLI has the same trap — exit code 0 is not a finished turn. Check is_error == false AND stop_reason in (None, "end_turn") AND a non-empty .result before you parse anything.