How I Got 12/12 Clean Sections Out of a Batched LLM Pipeline

TLDR: When you batch LLM calls to generate dense structured output, four things will break on you — and they'll break in sequence, one crash at a time. Here's all four.

The Job

I've been building /shopify-clone — a skill that screenshots a live Shopify store, reverse-engineers its layout block by block, and generates bespoke Liquid (Shopify's templating language) for each section using Claude.

Not a pattern-match. Not a template swap. A full LLM-driven code generator, 12 sections per clone, each section its own API call.

First clone target: a product page from a supplement ecommerce store — a subscription product with a selling plan.

It did not work the first time.

What Broke (In Order)

The pipeline kept returning garbage — malformed JSON, empty fields, silent failures mid-batch.

Each crash looked different. That was the tell. I wasn't dealing with one bug. I was hitting four separate reliability ceilings and discovering them sequentially, the hard way.

The Four Fixes

1. Switch to tool_use, not JSON parsing.

Asking the LLM to "return JSON in this shape" works until it doesn't. The model gets creative with whitespace, wraps it in markdown fences, adds commentary. Then your parser dies.

tool_use (forcing the model to call a structured tool definition instead of generating free-text JSON) removes that entire failure mode. The response either validates or it doesn't — no parsing gymnastics.

2. Set max_tokens=8192, not the default 4096.

This one was sneaky. The Anthropic SDK default is 4096. Dense Liquid — inline CSS, schema JSON, HTML — blows past that on complex sections.

The symptom? The tool gets called, but the liquid field comes back empty. The model hit the ceiling mid-generation and stopped. No error, just silence.

Fix: max_tokens=8192 as the default for any template-generating task. Log stop_reason in every error — if you see max_tokens there, you know exactly what happened.

3. Wrap every call in retry logic with exponential backoff.

Transient APIConnectionError and APITimeoutError happen. Not often — but 12 blocks × up to 3 attempts per block ≈ 36 API calls. At 1-3% flake rate, one network error per batch is essentially guaranteed.

The SDK's built-in retry does NOT cover these exceptions. One flake with no retry crashes the whole pipeline and you lose everything generated before it.

for network_attempt in range(3):
    try:
        msg = client.messages.create(..., timeout=180.0)
        break
    except (APIConnectionError, APITimeoutError) as e:
        time.sleep(2 ** network_attempt)

Three lines. Saves the whole run.

4. Auto-strip empty defaults before submitting the section.

The LLM sometimes returns fields with empty strings or null values that conflict with Shopify's schema validation. A post-process pass that strips those out before writing the file kept the generated theme from throwing on preview.

Minor fix. But it was the last thing standing between "11/12" and "12/12 clean."

Why This Matters

I paid the cost of discovering each of these in sequence — one crash, one fix, re-run, next crash.

If you're building anything that batches client.messages.create() for structured output — a code generator, a bulk rewriter, a data extractor — apply all four from the start.

Don't wait to rediscover them at 2am with half a pipeline down.