How a Billing Change Made Me Grep a Binary (and Simplify Everything)

Three changes needed:

"Brian" → a colleague (section heading + body)
$100/mo hard limit → drop the figure, reword
apollo-claude → "a shim script"

Everything else — claude-code-router, its GitHub URL, all platforms, all technical numbers — stays.

TLDR: Before you add a translation proxy, grep your binary. The tool you're wrapping might already speak the format you need.

The Bill That Changed Everything

On June 15, 2026, Anthropic split their subscription billing.

Everything headless — claude -p (running Claude in a pipeline), Agent SDK calls, GitHub Actions — moves OFF the flat subscription and into a separate credit pool billed at full API rates.

For most people? Fine. For me, not fine.

I run Apollo, my personal AI system, as a set of always-on daemons: an iMessage listener, a precall research agent, a nightly scanner, a RAG synthesis layer. Every one of them fires claude -p dozens of times a day.

The math got ugly fast.

Why "Just Call the API" Wasn't an Option

My first instinct: swap the daemons to raw OpenRouter API calls (OpenRouter, a model-routing marketplace). Cheaper, no subscription headaches.

Except these daemons aren't just doing text generation.

They're calling MCP tools (MCP, the Model Context Protocol — how Claude connects to external systems like my calendar or task manager). They're running Bash. They're doing file I/O. The entire harness is load-bearing, not just the model.

Strip out claude -p and you rip out the whole capability layer. Dead daemons.

The Proxy That Looked Like the Answer

Enter claude-code-router (https://github.com/musistudio/claude-code-router) — a community MIT-licensed proxy that intercepts Claude Code's model calls via ANTHROPIC_BASE_URL and re-routes them to OpenRouter, Ollama, Gemini, DeepSeek — while preserving the full harness.

MCP tools? Still work. Bash? Still works. The model doing inference changes; everything else stays.

I spent real time planning the full setup. A shim script. A budget check that halts before each call if I'm over cap. A hard monthly spending cap in my OpenRouter account. The whole thing.

It looked GREAT on paper.

What a Colleague Said (and What I Did Instead of Believing Him)

I shared the plan with a colleague for feedback.

He called the proxy "self-inflicted complexity."

His argument: the claude binary might already speak Anthropic's native format and be repointable via env vars. No proxy needed at all.

Now — I could've just taken that on faith. Or dug in and argued. I did neither.

I grepped the actual 215MB compiled binary.

56 vs. Zero

v1/messages: 56 hits
v1/chat/completions: 0 hits

Zero.

The claude binary has no OpenAI-format path whatsoever. claude-code-router exists to translate OpenAI-format providers into Anthropic format — but if my target providers (Ollama v0.14+, z.ai) already serve /v1/messages natively, the proxy is solving a problem I don't have.

The Fix That Was Already There

Set ANTHROPIC_BASE_URL to a native /v1/messages endpoint. Set the model env vars (ANTHROPIC_MODEL, ANTHROPIC_SMALL_FAST_MODEL). Run claude exactly as before.

The whole harness — MCP, Bash, file I/O, tool use — keeps working. Just cheaper inference behind it.

claude-code-router is now my documented fallback for when I genuinely need to route through an OpenAI-format provider. Everything else? Direct.

Why This Matters

The lesson isn't "don't use proxies." The lesson is: when someone challenges your plan, don't accept or reject on vibes.

I almost shipped an extra dependency, a budget shim, and a whole layer of operational complexity — because I assumed I needed a translation layer without checking what format my tool already speaks.

Thirty seconds of grep saved me weeks of maintaining infrastructure I never needed.

P.S. If you're running your own claude -p daemons, the env vars you want are ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, and ANTHROPIC_SMALL_FAST_MODEL. That's it. No proxy required — as long as your endpoint serves /v1/messages.