TLDR: One LLM call per visual block, not one per page. The pattern library was a dead end.
The Build
I've been working on a skill that clones Shopify landing pages — scrape a URL, segment the DOM into semantic blocks, generate matching Liquid (Shopify's templating language) for each one, assemble into a deployable theme.
The goal is simple: rapid landing-page variants for our store without hand-building each one.
Hours instead of weeks per variant.
v0.1 Was Humbling
First version used a pattern library — a handful of canonical templates (hero, buy-box, FAQ, generic-container fallback) that the code would match against each block and stamp out.
I looked at the output for a product landing page, cloned from a health supplement brand's site, and it looked like absolutely nothing like the brand.
Flat. Generic. AI-slop aesthetic.
The problem was obvious in hindsight: every real product page has 10+ visually distinct sections. Pre-canned patterns can never cover that long tail.
Option B
Three options on the table: bigger pattern library, per-block LLM generation, or a hybrid.
I went with Option B — one Anthropic API call per block, generating full Liquid + scoped CSS + schema from scratch.
The pipeline is five stages: scrape.py → fetch_product.py → segment.py → bespoke.py → generate.py.
The interesting one is bespoke.py. For every block the segmenter identifies, it fires a single LLM call with an emit_section tool, validates the output, retries up to 2× with error feedback, and falls back to a stub only if it still fails after that.
That last part matters. The pipeline can't stall because one section is weird.
What Broke Inside That (And the Fixes)
Running it the first time, I hit three separate failure modes.
-
JSON parsing instead of
tool_use— I was asking the model to return structured JSON in prose. It drifted constantly. Switching to Anthropic'stool_useAPI feature (force the model to call a defined tool with typed fields) killed the parsing errors entirely. -
max_tokens=4096was silently truncating — dense sections (HTML + inline CSS + schema) can easily exceed 4096 tokens. The failure mode was subtle: the tool was called, but theliquidfield came back empty. The model had been cut off mid-generation. Fix: default to 8192. Logstop_reasonon every error so truncation is actually diagnosable. -
APIConnectionErroron long runs — transient network hiccups, no retry logic, entire run dies. Added exponential backoff. Boring fix, completely necessary.
After those three: 12/12 sections generated clean. No stubs, no empty fields, no schema errors.
I also added auto-stripping of "default": "" entries from the schema — Shopify throws a validation warning on those, and the LLM sprinkles them everywhere by default.
Why This Matters to Me
The visual review of v0.2 is still pending as of when I wrote this — so I'm not going to tell you it looks like the brand yet. What I can say is the machinery is solid.
The deeper lesson isn't really about Shopify.
When you're asking an LLM to generate structured code at scale, three things matter more than your prompt: use tool_use so the output is typed, not parsed; set max_tokens high enough that complex outputs don't truncate silently; and add retry-with-backoff because network errors will eat your run.
Get those three right and the LLM's actual quality becomes the variable — which is the right problem to have.
P.S. The block segmenter still sometimes falls back to sequence names like
blk03,blk08when it can't pick a clean semantic hint. Cosmetic. I'll fix it when it annoys me enough.