TLDR: When Anthropic moved Agent SDK usage off the Claude Max subscription onto a metered credit pool, I had one day to rethink the email triage in Apollo Dashboard. The fix wasn't "drop the Agent SDK." It was routing its model calls through local Ollama instead — and the architecture is actually better for it.

The Setup

A few weeks ago I shipped the email ingest for Apollo Dashboard (my local-only OS cockpit — a Python app that runs at localhost and syncs calendar + email into state/*.json files the dashboard reads). The triage step classifies up to 60 inbound emails across three accounts — an ecommerce business, a consulting client, and a services client — into action / fyi / noise and ranks them by urgency.

The original commit message said it straight: "refactor: email triage via Claude Agent SDK (subscription) — zero metered billing."

That framing was correct for about three days.

The Wall

On June 15, 2026, Anthropic split billing. Verbatim from their help center:

"Starting June 15, 2026, Claude Agent SDK and claude -p usage no longer counts toward your Claude plan's usage limits."

It moves to a separate metered credit pool at full API rates. No rollover.

I knew this was coming — the announcement dropped in late May, and I'd already modeled the blast radius for Apollo's daemon stack (the Opus-heavy daemons alone projected a monthly cost far beyond what that pool could absorb; that's… not fitting). Email triage got caught in the same rethink even though Haiku is cheap: the PRINCIPLE was wrong. "Free on subscription" wasn't the baseline anymore.

What I Actually Changed

The obvious move would be: drop the Agent SDK, call the raw Anthropic API, watch the bill.

I went the other direction.

The Agent SDK already handled all the plumbing — structured async iteration, AssistantMessage parsing, ClaudeAgentOptions for non-interactive mode. I didn't want to rewrite that. What I wanted was to route the inference somewhere off-pool.

So I added a _route_for_model context manager. Any model name ending in :cloud gets its environment swapped for the duration of the call:

os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:11434"
os.environ["ANTHROPIC_API_KEY"] = "ollama"

Local Ollama (my M4 Max running glm-5:cloud) handles the call. The Agent SDK never knows. Belt-and-suspenders at module load:

os.environ.pop("ANTHROPIC_API_KEY", None)

If the key isn't there, nothing can accidentally bill. Not a race, not a stray import, not a test runner.

The model line in the ingest now reads:

EMAIL_MODEL = "glm-5:cloud"  # migrated 2026-06-14: on par with Haiku, reliable, off-pool.

The Part That Stayed on Subscription

Thread summaries are different. For each email that surfaces as action, I call claude-sonnet-4-6 via claude -p subprocess to generate a 14-word one-line summary. That path is the interactive binary — it's explicitly unaffected by the June 15 split.

Two different calls. Two different billing paths. The split in the architecture maps directly to the split Anthropic made.

Why This Matters

The billing change forced a question I should have been asking anyway: which calls actually need Anthropic's models, and which ones are "good enough" locally?

Triage — action/fyi/noise classification on 60 email subjects and snippets — is a task where a solid local model running on your own machine is more than sufficient. The latency is fine. The quality is fine. And it never touches the network.

Thread summarization — one Sonnet call per enriched thread, cached and only re-fired when the thread gets a new message — is where Sonnet's coherence earns its keep.

Model choice and billing path ended up pointing at the same answer. That's a good sign you got it right.

P.S. The :cloud naming convention is my own — it signals "route to Ollama, never bill" throughout the ingest. Makes grepping for off-pool calls trivial.