The Setup
Apollo (my personal AI assistant) runs an hourly scanner — a two-stage pipeline that processes inbound business signals.
Stage 1 (Claude Sonnet as collector) reads raw inputs and emits a list of ScanItem objects.
Stage 2 (Claude Opus as judge) scores each item against a Classification enum.
Both stages feed a ScanResult model built with Pydantic (my schema validation library).
Simple enough. Except it kept silently dying on me.
What Was Breaking
Every so often the pipeline would just… produce nothing.
No crash alert. No iMessage. Just silence.
I'd dig into scan.log and find a Pydantic ValidationError halfway through batch construction.
One item.
One item with a slightly off enum value, or a missing optional field, or an extra key the LLM threw in — and the entire batch was gone.
I was handing Pydantic the whole list at once. It hit a bad item, threw, and that was that. Forty-nine good items, torched.
What I Tried First (Wrong)
I tightened the prompt. Added explicit enum values. Added "do not include any keys not listed."
Helped a little.
But LLMs are probabilistic. Sonnet would occasionally capitalize an enum value differently. Opus would wrap JSON in markdown fences it wasn't supposed to. The real-world distribution of failures is wider than any prompt can fully constrain — I had to accept that.
I also tried extra="forbid" in Pydantic, explicitly rejecting unexpected keys. That made things WORSE. Now a well-intentioned extra field from the model nuked the whole batch. I was making the validator more brittle, not more resilient.
The Fix That Actually Worked
Per-item validation with graceful degradation.
Instead of building ScanResult all at once, loop over each raw item and validate it individually:
valid_items = []
for raw in raw_items:
try:
valid_items.append(ScanItem.model_validate(raw))
except ValidationError as e:
logger.warning(f"Skipping invalid item: {e}")
One bad item logs a warning and gets dropped.
The other 49 items go through.
I also flipped extra="forbid" to extra="ignore" — if the model adds a key I didn't ask for, ignore it and move on. And I added a specific enum guard in judge.py for the Classification field: if the value doesn't match exactly, coerce to a safe fallback rather than throwing.
Three small changes. The scanner hasn't silently emptied since.
The Lesson
LLM output is structurally unreliable by design. That's not a knock on the models — it's just the physics of the thing. They're not deterministic serializers.
The rule I've landed on: validate at the item boundary, not the batch boundary.
A batch can survive partial failures. A batch cannot survive an uncaught exception that terminates the whole run. If you're building any pipeline where structured objects come out of an LLM — a scanner, an import job, a report extractor — this is the default posture. Validate each item. Log what you drop. Keep moving.
Why This One Stuck
Honestly, this took me longer to fix than it should have because the failure mode was invisible.
No exception in my face. No red in the logs. Just… nothing surfaced. And I'd go looking for a prompt issue, a model issue, a network issue — and not find it.
The real bug was structural. I'd given a single bad item the power to kill the whole run.
Once I saw it that way, the fix was obvious. But I had to hit the silent failure a few times before I actually went looking at the construction layer.
That's the one I'll remember: silence in a pipeline is not success. It might be a validation error eating your whole batch.
P.S. If you're using Pydantic v2,
model_validate()in a try/except loop is the move. The oldparse_obj()is gone — don't get caught porting a v1 pattern and wondering why the import fails.