Why Apollo Was Starting Every Session With Amnesia — And the Recency Cache That Fixed It

TLDR: Apollo, my Claude-backed dev agent, was booting with full identity and rules — but zero awareness of what we'd shipped last session. A single rolling file called hot.md fixed that. Here's why I built it the way I did.

the gap

Every time I opened a new session with Apollo (my Claude-backed AI dev agent), it knew who it was immediately.

Rules loaded. Behavior prefs loaded. The whole routing table — which tool to reach for, which projects existed, which repos to touch — all of it loaded from memory.

But recent context? Nothing.

It had no idea that we'd shipped the CEO briefing retry-loop fix yesterday. No idea the promotion of my lab site was mid-flight. No idea a real estate investment client proposal was sitting on a decision from the client.

Every session started cold — and I was reconstructing state out loud, either from git log or just by telling Apollo what we'd been doing. Annoying at best. Slow EVERY time.

the idea I stole

I was doing a deep comparison of claude-obsidian (a community Claude Code memory pattern built around an Obsidian vault) versus Apollo's memory architecture. Most of what I found wasn't worth adopting. But one thing was GENIUS.

Claude-obsidian kept a file called hot.md — a rolling recency cache that got refreshed on session end via a model-summary hook. Boot read it and the agent woke up knowing what was in-flight.

That was the gap I had in spades.

how I built it differently

The obvious copy would've been the SessionEnd hook — let a model-summary call write the cache when the session closes.

I went the other direction.

The hook approach means metered LLM cost on every session end, plus daemon noise. I didn't want a background process with budget implications I couldn't see.

Instead: Apollo writes hot.md in-turn, the same step as inbox staging, at the end of any session where something actually shipped or a decision landed. Deterministic. Free. No hook required.

The write goes through a helper that prepends the new entry, enforces a budget (roughly the last 10 entries / ~500 words), and pops the oldest to memory/log.md — an append-only tail. Atomic. The trim is in code, not dependent on any daemon firing:

printf '## %s — <one-line summary>\n- Shipped: ...\n- Waiting on I: ...\n' "$(date '+%Y-%m-%d %H:%M')" \
  | (cd /path/to/apollo/obsidian-rag && uv run python -m apollo_rag.hooks.hot --prepend)

I did add a backstop hook as insurance — if a session ends without an explicit write, it appends a stamp from the transcript's last recap block. But the hook is insurance, not the primary writer.

the one rough edge

The backstop hook fires for every claude session end — including headless daemons.

Right now, the CEO briefing cron, the council runner, and the iMessage agent aren't gated with APOLLO_HEADLESS=1. The content self-gate (needs an actual recap block) makes accidental writes unlikely… but not impossible.

I haven't fixed that yet. It's the honest unfinished edge on this thing.

why it matters

The real lesson here isn't the file format or the helper command.

It's the distinction between ephemeral operational state and durable knowledge.

Durable knowledge — patterns I've learned, architecture decisions, what worked on a search tool for a cancer education business last month — lives in real memory files and gets promoted through the inbox.

hot.md is something different. It's just what's the live picture right now. Not knowledge. Not a learning. Just: what shipped, what's waiting on me, what's next.

Once I separated those two things clearly, the whole memory architecture clicked.

The sessions warm. The agent useful immediately. No catch-up.

That's worth the fifteen-minute build.

P.S. Most turns still write nothing to hot.md. Quick lookups, pure conversation, status checks — silence is the default. Only real state changes earn an entry. That discipline is the whole point.