TLDR: Real attendee PII was already committed — discovered in a full security audit, not at the gate. Replace it with synthetic data, gitignore the fixtures directory, and never trust a README note that claims the data is clean without verifying.

The Setup

We were building a webinar lead importer for an ecommerce business — a tool that takes webinar attendee exports, parses them, and pushes leads into Close (their sales CRM), with a Claude LLM pass in the middle to enrich and triage.

Normal stuff. Nothing exotic.

To build it, you need test data. And so — like every builder does at least once — we grabbed a real export.

113 rows. Real names, real emails, real phone numbers. Actual attendees from an actual webinar on actual health topics.

Dropped it in fixtures/. Committed it. Pushed it.

What the Audit Found

Fast-forward to a full CSO security audit across the entire ecommerce business fleet.

The webinar lead importer came up. And there, right in the git history, fully visible to anyone with repo access — 113 real attendees who signed up for a health webinar, permanently baked into version control.

That alone is bad.

But here's the part that made me cringe harder: the README had a note claiming the fixtures were synthetic.

It was wrong.

The "safeguard" was actively lying to anyone who came after us. We hadn't just committed real PII — we'd also committed a false assurance that we hadn't.

The Fix (Three Moves)

  1. Replace every row with synthetic data — fake names, generated emails, fake phone numbers. Same shape. Zero real PII.
  2. Gitignore fixtures/ so no future export can accidentally slip back in.
  3. Correct the README so the note actually reflects reality.

One important thing to understand about the gitignore: it only stops the next one.

Once data is committed, it's in history. The fix here was replacing the content and rolling forward. For a localized fixture directory, that's the right call.

Why It Happened

I want to be honest about the root cause, because it wasn't carelessness.

It was a category error.

Fixtures feel like throwaway dev-only files. Not production. Not user-facing. Just… testing infrastructure. That mental model is exactly what lets real PII slide past every other guard you have.

And the false README note made it worse — someone (probably me) wrote that note with the intention of using synthetic data, before the real export was ever dropped in. The note became a placeholder that got committed as gospel.

Why This Matters

There's a principle I keep coming back to from building LLM apps: the model can't leak what it never had.

Strip the sensitive fields from the prompt before it reaches the model. That's your strongest defense — not instructions, not output filtering. Data minimization at the source.

Same rule applies one layer down.

The repo can't leak what it never contained.

Synthetic fixtures from day one. Not as a policy rule you write in a README. As a discipline you enforce before the first git add.

P.S. If you're reading this and have a fixtures/ or testdata/ directory in a client repo — go check it now. Seriously. I'll wait.