TLDR: Grep finds the word you typed. RAG finds what you meant. Neither alone was enough — and fusing them costs almost nothing extra.

The Memory Problem

Apollo, my AI agent, runs off a vault of personal memory files — rules, decisions, project state, learned feedback patterns.

When Apollo needs to look something up mid-session, it has to search that vault.

For a long time, that meant one of two things: grep for a keyword, or load a pre-built concat file (~50K tokens of every memory file smooshed together).

Both were broken. Just in different ways.

Why Grep Alone Fails

This is the dangerous one: silent false-negatives.

Grep finds what you typed. So if you search "deploy" and the memory note says "ship to prod" — grep returns nothing.

No error. No warning. Just… nothing.

Apollo assumes the rule doesn't exist and improvises. Fine if there's no rule. Catastrophically wrong if there is one.

That's not a search failure. That's a CONFIDENCE failure. Apollo confidently does the wrong thing because it found nothing — and it has no idea.

Why the Concat File Didn't Fix It Either

My first instinct was: just load all the memories at once.

Hence the concat reference file — a shell script that cats every memory file into one big blob, rebuilt on save, available on demand.

The problems were real:

  • Staleness. Edits made directly in Obsidian (my note-taking app) between rebuilds? Invisible.
  • RAG never used it. My semantic search engine (BM25 + vector embeddings + Reciprocal Rank Fusion) already indexed memory/**/*.md individually. The concat was solving a problem that didn't exist for it.
  • The vocabulary problem doesn't go away. Even with everything in context, an LLM reading the blob still has to find the right section. You just push the problem up a layer.

So the concat gave me a big blob of memory that was stale, redundant for RAG, and didn't actually fix the match problem. Cool.

The Fix: Hybrid RAG + Adaptive Multi-Grep

I built a hybrid search skill — one that runs both in one pass, then merges the results.

  • RAG (semantic similarity search via my RAG engine's query command) handles vocabulary mismatch. Searching "deploy" finds "ship to prod". The embedding — text-embedding-3-large — finds what you meant, not just what you typed.
  • Adaptive multi-grep covers what RAG structurally can't: CLAUDE.md instruction files, any file edited since the last reindex. Fresh by definition.
  • Grep runs concurrently in a background thread while RAG's round-trip is in flight — so the grep layer adds only ~0.4s on top of RAG's ~3.94s.

Total: ~4.3s end-to-end. Cost: ~6 tokens per search — just the query string embedded once. There's no LLM generation inside the search itself. The calling agent synthesizes in its own turn.

After shipping this, I retired the concat reference file entirely.

Why This Matters

The real lesson isn't "use RAG."

It's: understand where each tool fails.

Grep fails on vocabulary mismatch — and it fails silently. That's what makes it dangerous in a memory system. Silent failure means confidently wrong behavior instead of a catchable error.

RAG fails on freshness and on files it was never told to index.

Neither closes both gaps. The hybrid does.

If you're building agent memory and relying on grep alone, test this now: search for a concept using a word that doesn't appear in the relevant file. If you get nothing back — that's your gap.

P.S. The concat reference file wasn't a dumb idea — it was yesterday's answer to yesterday's problem. The moment RAG got cheap enough to run on every query, the concat became a liability. Sometimes the right refactor is recognizing when the old workaround can finally go.