When 'Make It All Searchable' Is the Wrong Instinct

TLDR: When your AI agent needs to filter relational data, reach for structured SQL column-filters — not embeddings. The column whitelist is your security boundary. RAG is for prose.

The Setup

I'm building a practice-management app for a law firm client — contacts, matters, engagement letters, conflict checks, the works.

The AI chat agent could already search contacts by name.

Then I decided I wanted more.

The Moment I Almost Built the Wrong Thing

My exact instinct: "all fields in all tables need to be searchable for the agent."

And my second instinct — almost automatic — was to reach for RAG (vector embeddings, my semantic search layer).

I caught myself before I built it, but just barely.

Why Embeddings Were the Wrong Tool Here

So what's wrong with embedding everything? GREAT question.

Think about what's actually in the contacts and matters tables.

Statuses like active / closed
Fee amounts
Dates like 2026-05-31
UUIDs like a3f8c...
Phone numbers, email addresses, enums

Embedding a phone number is nonsense. You'd get fuzzy, probabilistic matches on values that demand exact comparison. The whole point of relational data is that it's structured — and the right tool for structured data is… SQL.

What I Built Instead

Two query tools with a {field, operator, value} schema — one for contacts, one for matters.

The operators:

eq, neq, contains
gt, gte, lt, lte
in, is_empty, is_not_empty

Conditions AND together. The agent can now ask things like: "find all open matters with a fee above a threshold for contacts in Utah" — and it gets exact answers, instantly, with no embedding infra.

The Security Bit (Don't Skip This)

Here's the part that actually matters most for anyone doing something similar.

When you hand arbitrary field-querying to an LLM, the column whitelist is your injection boundary.

Unknown column names throw immediately. The model can't probe columns you didn't explicitly allow. That's your fence.

One more trap I hit: don't run your name-search sanitizer on filter values. It strips characters that corrupt emails and ISO dates. Filter values go straight through the query builder as parameterized inputs — that's what parameterization is for.

Commit dca4f75. Verified every operator against the live DB before shipping.

Why This Matters to Me

There's something that feels powerful about "make it all searchable" — it sounds thorough.

But it conflates two completely different problems. Free-text prose → RAG. Structured columns → SQL filters.

If you're wiring an AI agent to a database-backed app, split those surfaces out loud before you pick a tool. The decision takes five minutes. Building the wrong thing takes days.

P.S. The free-text columns (notes, descriptions) are next. That's where RAG actually earns its keep — but that's a different post.