TLDR: Neither pure semantic search nor pure keyword search is enough. You need both, fused — and I had to build each one twice to figure that out.

The Setup

Apollo (my AI assistant, built on the Claude Agent SDK) has a tiered memory system.

Tier 1 is a routing table — a tiny MEMORY.md file loaded every session. Tier 2 is a full reference file, read on demand. Tier 3 is the fuzzy part: how do you find something when you don't know exactly what file it's in or what words it uses?

My first answer was grep.

Why Grep Broke Down

Grep is great right up until it isn't.

It finds what you already know to search for.

If I ask "what's the rule on deploys?" and the memory file says "ship to prod," grep returns nothing. Silent false-negative. No error, no warning — just an empty result that looks like "nothing exists."

That's the insidious part. You don't know what you're missing.

The First Build: Hybrid RAG Inside a Local Search CLI

I spec'd out the Tier 3 upgrade and then actually built it (2026-05-08): a local hybrid RAG CLI engine on top of an SQLite database.

The architecture, because the specifics matter:

  • BM25 (sparse/keyword index) for exact-term recall
  • vec0 (SQLite vector extension) for dense embedding search
  • RRF (Reciprocal Rank Fusion, my fusion layer) to merge the two ranked lists
  • A Sonnet synthesis pass to produce a cited answer from the top chunks

That last step is important. You don't just want chunks — you want a model that reads the top results and writes you an actual answer, with citations so you can audit it.

BM25 alone is grep with a scoring bonus. Embeddings alone miss fresh edits and exact terms. RRF is what makes them genuinely complementary: each covers a blind spot the other can't.

The Honest Part

I thought I was done.

And then I ran into a fresh problem: the local hybrid RAG CLI only indexes .md files. A PDF dropped into the vault is invisible to it. A memory file edited five minutes ago won't appear until the next reindex. And instruction files outside the indexed corpus? Also invisible.

RAG had its own silent false-negatives.

So I built a second layer — a unified memory-search tool (2026-06-11) — that runs RAG and adaptive multi-grep in one pass.

The grep layer here isn't the dumb version I started with. It targets instruction files and recently-modified files specifically — the things RAG structurally can't cover. Results are merged, stale hits are flagged and down-ranked, and I get a synthesized answer with raw hits for audit.

It's grep coming back… but smarter.

Why This Matters

The lesson is: fuse, don't choose.

Semantic search handles synonyms, paraphrase, meaning. Keyword search handles exact terms, fresh files, things outside the embedding corpus. Neither is complete. The failure mode for both is identical and invisible — a result that looks like "nothing found" when the thing is actually there.

RRF is simple and it works: run both, merge the ranked lists, let the overlap surface the confident hits.

If you're building any kind of memory or retrieval layer for an AI system — even a small one — wire up both from the start. The incremental cost is low. The silent false-negative problem is real, and you won't see it until a user (or you) notices something obviously missing.

P.S. The PDF thing caught me twice. If you're using a RAG system that globs only .md files, any PDF in your vault is invisible until you extract its text into a note. Worth knowing before you assume the index is complete.