TLDR: Poll your live data fast. Cache the LLM's ranked order separately. Invalidate the cache on an explicit trigger — not on every data change.

The Apollo Dashboard is my personal cockpit — a Python localhost server I keep in my Dock that surfaces everything I need to run my day.

One of those things: my Today list from Things 3, my task manager.

I didn't just want the tasks dumped in Things' default order. I wanted them ranked — what actually matters most right now, scored by something smarter than position-in-list.

So I wired in Ollama (my local LLM runtime, running fully on-device), fed it my Today tasks, and asked it to sort them by priority.

And then I ran into the wall.

The Wall: 40 Seconds Per Rank

The dashboard polls Things 3 via AppleScript every 15 seconds to keep the task list live.

Ranking via the LLM was a separate problem — and a slow one.

I sent the model my ~63 tasks and asked for a priority order. Real task IDs. UUIDs. 22 characters each.

A UUID is ~10-12 tokens. Emitting 63 of them back is… a lot. On qwen2.5:7b-instruct, the ranking call took ~40 seconds to complete. Plus a ~20-second cold load if the model had unloaded since the last call.

A minute per re-rank. On a 15-second poll loop.

Not happening.

The Integer Fix: Same Task, 2x Faster

I went back and read the math more carefully.

The model was slow because it was generating tokens, not because it was "thinking hard." Every character in a UUID costs tokens on the way out.

Fix: assign each task a short integer id at prompt-time — 1, 2, 3 — and map back to the real id server-side after the response.

An integer is 1-3 tokens. Not 12.

Same model. Same prompt. Same task. Just integers instead of UUIDs.

Result: ~18 seconds. Consistently. That's the entire gain — purely fewer output tokens to emit.

(I also lock in options.temperature ~0.2 and format: json with a strict "Return ONLY JSON: {...}" instruction. Stable, machine-parseable output is worth a lot here.)

The Real Design: Separate Membership from Rank

Even 18 seconds is too slow to block on every poll. So I had to invert the normal caching instinct.

Normally you invalidate a cache when the data changes.

Here I did the opposite.

The AppleScript poll owns membership — what tasks exist right now. Check something off in Things 3? Gone immediately from the dashboard. Add a new task? Shows up in the next 15-second tick.

The LLM response owns rank order — a sort key I apply over the live membership list.

Those two layers are fully decoupled.

The rank cache does NOT invalidate when you check a task off. It only invalidates on an explicit Re-rank button click, or a server restart.

That inversion is what makes the whole thing feel fast. The list stays live and reactive. The expensive priority judgment only runs when you actually ask for it.

One More Gotcha

Even with integer IDs, small models leak task ids into rationale fields. The model would correctly emit integers in the order array — and then write "Task 7 is urgent because…" somewhere else, embedding the integer in a field that should only hold a name.

Don't rely on the prompt to fully contain this.

I added a deterministic post-process step: after the response lands, scan every field for known id substrings and replace them with the task's actual name. Belt and suspenders.

Why This Stuck With Me

The lesson generalizes pretty far.

A model's ranked order is a judgment, not a fact. Facts change constantly. Judgments can be cached. Wire the expensive inference to an explicit trigger, apply its output as a sort key over your fast-changing live data, and the system feels responsive without paying for every poll.

Triage, recommendations, summaries — anything where the shape of a decision stays valid longer than the raw data it was made from. This pattern shows up everywhere once you see it.