TLDR: Pull the LLM into a scheduled ingest job. Let the runtime be dumb and fast. Bonus: the Claude Agent SDK runs on your subscription — zero metered billing per run.

the setup

I'm building the Apollo Dashboard — a personal cockpit that lives in my Mac's Dock and serves my day to me at localhost.

The whole premise is that the runtime stays simple: pure Python stdlib, no framework, no external network calls in the request path. Fast. Offline-capable. 127.0.0.1 only.

Then I added AI-powered email triage.

the design that made it work

The runtime never touches the LLM. Ever.

Instead, a separate macOS launchd job — com.apollo.dashboard-ingest — runs once an hour in the background. It's the only piece that talks to the outside world: Arcade (my MCP auth provider) for calendar data, and Claude Haiku via the Claude Agent SDK for email classification. When it's done, it writes everything it learned to state/*.json files on disk.

The runtime (server.py, pure stdlib) just reads those files. No API calls. No latency. No network dependency.

The handoff layer is a folder of JSON files. That's it.

the billing wall I hit first

Here's the part that forced a refactor: my first cut called the Anthropic API directly — metered per token.

Running Haiku triage once an hour on a full inbox isn't free. It's not expensive either, but the costs compound, and more importantly: I'm already paying for Claude. Using the raw API on top of a subscription felt wrong.

The fix was the refactor to the Claude Agent SDK, which runs against your subscription — not your per-token billing. The ingest job now calls Haiku as many times as I want and the cost is exactly nothing on top of what I'm already paying.

That one commit — refactor: email triage via Claude Agent SDK (subscription) — zero metered billing — is GENIUS in retrospect. I should've started there.

fail-closed by default

One principle I got right: precision over recall for the "needs my reply" view.

If Haiku can't confidently say an email requires my attention, it stays off the list. Not surfaced, not hedged — gone.

An empty list is better than a list with noise. The whole point of triage is that I can trust what surfaces. The moment it starts showing things I don't need to act on, I stop looking.

why this pattern is reusable

This is the split I want on every AI-assisted local tool:

  • Ingest job — talks to the outside world (APIs, LLMs, OAuth). Writes durable state to disk on a schedule.
  • Runtime — reads that state. Serves the UI. Stays fast, offline-capable, and dumb.

The LLM is the expensive part — slow and/or metered. Keep it in the job. The runtime is cheap — reads files and renders HTML. Keep it clean.

I can restart the dashboard, open it offline, refresh 20 times — it's instant every single time. The AI work happened an hour ago and it's sitting in a file waiting for me.

That's the architecture I want everywhere.