TLDR: A pure-stdlib Python server can use Claude Haiku for email triage and a local MLX model for task ranking without importing the SDK. The rule: the runtime never imports a model. Intelligence runs out-of-process — as a scheduled ingest job that writes JSON, or as a sidecar the server talks to over HTTP but never touches directly.
the setup
My personal dashboard is a personal cockpit — Tasks, Calendar, Email, System — that launches from my Dock and lives at localhost:7842.
One hard constraint from day one: pure Python stdlib. No pip install at runtime. Zero deps. If my internet goes down, the server still comes up.
the wall
I wanted real intelligence in here.
Claude Haiku (Anthropic's fast, cheap model) scanning my inbox and flagging what actually needs a reply. A local model ranking which of my Things 3 (my task manager) tasks I should touch today.
The problem isn't calling a model. That part's EASY. The problem is where you call it.
Put a model call in your request handler and every /email page load is now a metered API hit. Slow. Expensive. And you just added a non-stdlib dep to what was supposed to be a zero-dep server.
the ingest pattern
The fix: pull all the intelligence out of the runtime entirely.
I wired up a separate hourly launchd (macOS's job scheduler) ingest job. That job does all the LLM work — Haiku triage via the Claude Agent SDK, calendar events via Arcade (my OAuth/MCP auth provider) — and writes results to state/email.json and state/calendar.json.
The dashboard server reads those files. That's it.
launchd ingest job (hourly) → state/*.json → pure-stdlib server (reads only)
Email triage was actually a refactor — I'd wired it more directly first, hit the billing concern, and moved it to the subscription OAuth path inside the ingest job so at the time I built it, it drew from my flat subscription rather than metered credits per call.
the claude -p trick
For the task ranker I wanted Claude available as a backend — triggered by a manual Re-rank button, not a page load. Still couldn't import the SDK.
Answer: shell out to the claude binary directly.
claude -p --output-format json --model claude-haiku \
--setting-sources "" --strict-mcp-config \
"<ranking prompt>"
--setting-sources "" strips project config. --strict-mcp-config blocks MCP server connections. Run from /tmp so no CLAUDE.md gets picked up. The caller stays pure stdlib — no SDK import, no dep. The model does its thing. You parse the JSON.
the local inference sidecar
Same principle for my local MLX inference setup (running Apple's ML framework for Apple Silicon): it runs in its own isolated venv as a sidecar process, launched on demand, queried over HTTP.
The server starts it, health-checks it, talks to it. Never imports it.
why this matters to me
There's always a pull toward "just put the model call here — it's one function."
But once you import the SDK into your server, you've coupled your runtime to your AI layer. Billing changes, model swaps, SDK updates… all become your server's problem.
The better rule: the runtime never imports a model. Intelligence lives out-of-process and communicates over JSON or HTTP. Three instances, same principle — scheduled ingest writes JSON, claude -p subprocess serves the manual button, local MLX inference sidecar handles local inference.
The server reads. It doesn't think. Build it that way from day one.