A Fix the Agent Can't Run on Its Own Is Not a Fix

TLDR: If your autonomous system fails and your "fix" is to do the job by hand, you didn't fix it. You just did it once.

The Setup

Apollo (my AI agent, built on Claude) has a launchd daemon — macOS's background job scheduler — that fires every morning at 8 AM. It assembles my CEO daily briefing: emails triaged, tasks ranked, calendar scanned, priorities surfaced. Delivered to my inbox. Unprompted.

That's the whole point. It runs itself.

What Broke

June 16th. 300-second hang. Timed out.

We tried again. Another 300-second hang.

The logs told the real story: RESULT: OK ×13 on 06-15, FAIL on 06-16. It had worked thirteen times the day before. Then it just… stopped.

What I Tried (That Didn't Work)

Theory #1: Claude auto-updated overnight. New build broke the tool loop.

Nope. The new build ran fine interactively. That theory died in under five minutes — but not before I'd committed to it long enough to waste real time chasing it.

Theory #2: macOS TCC permissions (Transparency, Consent, and Control — the system that controls which apps can access disk, contacts, calendar, etc.) got orphaned by the update. Headless run couldn't answer the prompt.

There was a real TCC popup. So this felt airtight.

It wasn't. We ran a fully-instrumented stream with every grant present, Full Disk Access enabled — and it still hung.

Then came the worst move of the session.

The Workaround Trap

Apollo hand-built the briefing. Delivered it inline. Said: here's your briefing, this is resolved.

It was NOT resolved.

Doing the job by hand once is not fixing an autonomous job. I had thirteen more mornings coming where the daemon would hang again, and nothing we did had changed that. Calling it resolved was just a way to feel done.

The Fix That Actually Worked

The advisor — a stronger review model (my escalation tool that sees my full conversation transcript) — caught what I missed: the comparison I'd built was confounded.

"Shell works, launchd hangs → it must be launchd." Sounds airtight. But the TCC grants had actually changed mid-session while I was clicking popups. The two environments were never equal. That's not evidence. That's noise.

So we went to the logs. Really went to them.

Real root cause: my email body pulls over MCP (the tool-calling protocol my agent uses to fetch email data) were HUGE. They blew the output token cap. The overflow spilled to a temp file. The headless model couldn't reliably re-parse it, tried, looped, hit the 300-second ceiling. Twice.

The fix: cap the email body size. Keep the model in its lane.

Daemon runs clean.

Why This Matters

Three wrong moves, one session — and all three had the same shape:

Declared a root cause from plausibility, not from data
Built a comparison without equalizing every variable
Called a manual workaround a resolution

The rule I baked into memory after this: a fix the agent can't run on its own is not a fix.

If you're building autonomous systems — agents, daemons, scheduled jobs, anything that's supposed to work without you in the loop — the bar is higher than "I made it work this time." The question is: will it work tomorrow at 8 AM when I'm asleep?

Now when something breaks, I instrument the real run. I read the logs. I call the advisor before I commit to a theory. And I don't ship a "fix" until the scheduler can prove it itself.

P.S. The tool that caught the confounded comparison? I'd ignored it for the first two wrong turns. Call the reviewer before you commit to the story — not after you've shipped the wrong one.