Why Your RAG Chunks Don't Know Where They Came From

The Setup

I was building a real RAG system for Apollo — my AI assistant that reads my Obsidian vault (my personal knowledge base, thousands of notes). The goal: ask Apollo a question, it retrieves the right chunks from memory, and synthesizes a cited answer.

Simple enough in theory.

What I Tried First

I started where everyone starts: fixed chunking. Every 500 tokens, slice.

It's fast. It's dumb. It cuts mid-sentence and drops ideas off cliffs.

OK — switched to semantic chunking, using embedding similarity between sentences and cutting when the distance spikes. Mechanically better. Still not right.

So I moved to agentic chunking: feed the document to gpt-4.1-mini with type-specific prompts and let the LLM decide where topics actually shift. Real section breaks. Coherent ideas. Strict bounds: 200–1200 tokens, cut at H2/H3 headers and ## Why / ## How to apply blocks.

The chunks were genuinely good.

And retrieval was STILL broken.

The Wall I Hit

Here's the thing nobody warned me about.

A chunk is retrieved completely naked. No filename. No document title. No frontmatter.

So when the retriever pulled a chunk starting with ## Architecture — Repo: /Users/me/Developer/webinar-pm, it had no idea that was from my webinar PM project memory. The heading looked coherent. It was coherent. But it was severed from every anchor that would make it useful.

Beautifully cut chunks. Orphaned fragments handed to the synthesis model.

That's… not retrieval. That's confetti.

The Fix That Actually Worked

The rule that unlocked it: include enough surrounding context that the chunk is interpretable on its own.

Concretely: inject the document's frontmatter into each chunk at index time. The file name, description, type, dates — baked verbatim as a header on the chunk text before you embed it.

Now a retrieved chunk doesn't just say ## Architecture. It says:

a webinar PM tool (project, updated 2026-05-28) — Internal event PM replacing Notion V3. ## Architecture…

The synthesis model knows what it's reading.

I also wired in per-chunk dates so synthesis can flag staleness. If a chunk describes "a migration in progress" but the note is 9 months old, that surfaces as a hedge — not a confident present-tense claim. Retrieval over a living knowledge base without date-awareness is basically lying to yourself.

(One honest gotcha: the agentic chunker sometimes returns non-JSON for edge-case files. I built in automatic structural fallback. Watch the rate — mine runs under 5%.)

Why This Matters to Me

The lesson isn't specific to Obsidian vaults or personal memory systems.

Chunks are retrieved without their document's context. If the chunk can't be understood alone, a smarter splitting strategy won't save you.

Agentic chunking is GENIUS — genuinely better than any mechanical approach. But semantic quality at the boundary is only half the job.

The other half is what's inside the chunk when retrieval hands it to your synthesis layer.

Build for the chunk being naked. Dress it before you embed it.