TLDR: Dropped a literal {channel_id} placeholder into a Python f-string to guide an LLM. Python treated it as a variable, crashed every run, and my agent went dark for 44 hours — while the scheduler just… kept retrying.

The Setup

I run Apollo, my AI agent, on a Python scanner that sweeps my Slack and Fathom (my meeting recorder) transcripts and turns them into tasks in Things 3, my task manager.

Every run, it calls build_scan_prompt — a function that assembles a big f-string and ships it to Claude Sonnet for extraction.

It had been working perfectly for weeks.

What Broke It

I made a dedup fix (commit 4ea7546). I wanted Sonnet to understand the exact format of a raw ID, so I added this to the prompt instructions inside collector.py:

Use the format {channel_id}:{ts} as the raw_id for Slack messages.

Totally reasonable thing to write.

Completely broke everything.

Because those lines live inside build_scan_prompt, which starts with return f""".... In an f-string, every {...} is treated as a variable interpolation — no exceptions, no context. Python saw {channel_id} and went looking for a variable named channel_id.

There wasn't one. NameError: name 'channel_id' is not defined. Script dead on line 1.

The Part That Made It Worse

I use launchd (macOS's background scheduler, like cron) to run the scanner every hour.

launchd does not know or care why a script failed. It just knows it exited non-zero. So it waited an hour and launched it again.

And again. And again.

213 times. Over 44 hours. Zero tasks captured.

The only reason I noticed was that tasks just… stopped appearing in Things. I went looking, found 213 identical NameError stack traces in the log, and it hit me immediately. Two characters. That's what it was.

The Fix

Double the braces. That tells Python: emit a literal {, don't try to interpolate.

# Before (crashes)
Use the format {channel_id}:{ts} as the raw_id.

# After (correct)
Use the format {{channel_id}}:{{ts}} as the raw_id.

The f-string renders those as {channel_id}:{ts} in the final string — exactly what Sonnet needs to read.

Fix was in commit 51b8054. End-to-end smoke run captured 2 actionable items + 12 dropped. Back online.

What I Do Now

The real lesson isn't brace doubling — I'll probably forget and do it again someday.

The lesson is: a static smoke test would have caught this in 5 seconds.

python -c "from src.collector import build_scan_prompt; build_scan_prompt({})"

That's it. If the function constructs without raising, you're fine. I now run this before any deploy to either scanner. The Fathom scanner's gotcha doc literally says: "The 2026-04-24 44-hour outage was caused by a brace bug. Static smoke caught this once at build time."

213 retries and a missed day of task capture taught me something no unit test had: the code path that builds the prompt is the most fragile path in the whole system — because it's where Python's string mechanics and the LLM's expected syntax live in the same line of code.

Test it explicitly. Every time.

P.S. If you're building anything like this — a Python function that assembles an LLM prompt with literal examples of JSON, template syntax, or variable placeholders — just assume there's a brace that's going to bite you. Smoke test the builder. It takes 5 seconds. The outage doesn't.