I Almost Built the Wrong Proxy. Then I Grepped the Binary.

TLDR: Stock claude speaks Anthropic /v1/messages natively — no proxy needed. If your inference provider does too (Ollama v0.14+ does), just set ANTHROPIC_BASE_URL and you're done.

the problem

I run four automated daemons.

iMessage assistant. Precall research. Calendar polling. A judge agent.

All of them hitting Anthropic's API. All of them eating credits.

And with a billing-structure change landing June 15th, I had about two weeks to get them off Anthropic entirely.

my locked plan (which was wrong)

My first move was claude-code-router — an MIT-licensed proxy (an open-source shim that sits behind ANTHROPIC_BASE_URL and translates Claude's calls to whatever model backend you want). Route the daemons through it, out to OpenRouter, onto Qwen 3.6. Slap a hard monthly spending cap on top.

I'd documented it. Scoped it. Scheduled it.

It felt like a real plan.

what a developer friend said

I got on a call with a developer friend over Fathom. He builds agents for a living.

He was not impressed.

"I would definitely not be using Claude for these kind of agents."

And then: "keep your code kind of neutral and do it at a routing layer" — fine, sure — but his main point was that the whole proxy layer was self-inflicted complexity. His recommendation: skip it entirely. Stock claude, pointed straight at Ollama Cloud (Ollama's hosted inference service, not local).

I wanted to believe him.

But I have a rule: verify before claiming. He's sharp. That doesn't mean I take his architecture opinions on faith.

the grep

I grepped the actual installed claude binary.

Not docs. Not a GitHub readme. The real 215 MB compiled binary sitting at ~/.local/share/claude/versions/<ver>.

Here's what came back:

v1/messages — 56 hits
v1/chat/completions — zero

That's it. No OpenAI-format path anywhere in the binary.

Stock claude speaks Anthropic's native Messages API, natively, always. Which means claude-code-router only earns its keep when you're targeting a provider that speaks OpenAI format and needs conversion. The moment your provider natively serves /v1/messages… the router is dead weight.

Ollama v0.14+ natively serves /v1/messages. THROUGH THE ROOF useful — and I had zero idea.

(Two weeks of planning. Thirty seconds to disprove it.)

the fix

Three env vars:

ANTHROPIC_BASE_URL=<ollama-cloud-endpoint>
ANTHROPIC_MODEL=glm-4.6:cloud
ANTHROPIC_AUTH_TOKEN=<ollama-api-key>

That's the whole migration. Stock claude routes straight to Ollama Cloud's /v1/messages endpoint. The :cloud suffix on the model name tells Ollama to run inference on their GPUs, not mine.

One real gotcha: Ollama's endpoint doesn't support count_tokens — the ?beta=true variant Claude Code calls for token budget tracking. There's a known hang (ollama#13949) where it leaves the server unresponsive. Build a wrapper check before you cut over. Seriously, don't skip it.

what shipped

All four daemons migrated off Anthropic by June 14th — one day before the deadline.

Ollama Cloud Pro, flat-rate.

I ran a bake-off to pick the winning model per agent — real production paths, blind Opus judge, hand-labeled gold sets. iMessage Apollo I deliberately kept on Anthropic Opus. That one earns the premium. Everything else chose itself.

why this matters to me

I spent two weeks designing a proxy I didn't need.

The grep took thirty seconds.

Before you build the bridge, check whether the river is actually there. If I'd grepped the binary on day one, I'd have skipped the entire router design phase and shipped two weeks earlier.

If you're routing Claude Code agents to a cheaper provider: check whether that provider natively serves /v1/messages. If it does — skip the proxy, set ANTHROPIC_BASE_URL, and go build something.

P.S. The claude-code-router plan is still documented as my fallback — for any provider that does speak OpenAI format, it's exactly the right tool. It's not bad. It just wasn't what I needed.