Why My AI Agents Only Get Western Inference (But I Don't Have To)

TLDR: I split my AI infrastructure by consent axis. Daemons that fire without me → Western providers only. Me at the keyboard → my call, per-call.

The Billing Gun That Made Me Think

On June 15, 2026, Anthropic changed how claude -p (headless, non-interactive Claude Code) gets billed.

Everything autonomous — background agents, daemons, scheduled runners — moves off the flat subscription onto a separate API credit pool at full API rates.

So I had to migrate Apollo's autonomous daemons to cheaper inference anyway. The iMessage listener, the email scanner, the precall research agent, the RAG synthesis layer. All of them.

The Obvious (Wrong) First Move

I already use /minimax (MiniMax M2.7) and /zai (GLM-5.1 via Z.AI) interactively. They're fast, cheap, and genuinely capable. Great sub-agents for banging out code when I'm at the keyboard.

So my first instinct: just route the daemons through those same providers.

Cost problem solved.

I sat with it for about five minutes before something felt off.

The Real Question

Daemons fire without me.

That's the whole point of them — iMessage Apollo reads and replies to messages with my wife and the kids while I'm working. The scanner classifies every email that hits my inbox. The precall agent pulls from Fathom transcripts, my Obsidian vault, years of notes.

None of that has per-call consent.

"All my data, every day, to a foreign jurisdiction" is a qualitatively different thing from "I'm choosing to send this code snippet to MiniMax to write some unit tests."

When I send a snippet interactively, I made a decision. Scope is narrow. I'm there.

When a daemon fires at 2am and routes my iMessage thread through Chinese infrastructure… I didn't decide anything that morning.

The Nuance That Actually Matters

My first instinct was "ban Chinese models for daemons." Too blunt.

The real line is inference jurisdiction, not model origin.

Qwen 3.6 weights via Together AI (US-hosted infrastructure)? Fine for daemons. Qwen 3.6 via Alibaba's own API? Not fine.

Same model. Completely different answer.

Open-source weights are open-source weights — the question is which company's infrastructure holds the data during inference.

That unlocked a useful allowed list:

OK for daemons: Anthropic, OpenAI, Google, Together AI, Fireworks AI, Groq, OpenRouter (with Western backend)
Interactive-only: MiniMax, Z.AI, DeepSeek direct, Kimi direct, Qwen via Alibaba

The Decision

I was offered three options when designing the migration: any provider OK / Western for iMessage only / Western everywhere.

I picked the strictest.

And to be clear — this is not a moral judgment on the providers. A developer I know runs MiniMax, Ollama, and DeepSeek extensively and I respect that pattern completely. The rule is specifically about autonomous exposure of my most sensitive data without per-call review.

The Rule I Carry Forward

Set data policy on the consent axis, not the cost axis.

If I'm at the keyboard making an active decision about what goes where — that's my call. Per-call. I can weigh cost, capability, speed, all of it.

If something fires autonomously with no per-call consent and handles aggregate sensitive data — Western providers only. Period.

The cost savings aren't worth the exposure I never consciously agreed to.

That's the one thing I'd tell any builder running autonomous agents: before you optimize for cost, ask yourself whether you're actually there when the data moves.