Two edits needed:
the an ecommerce business supply-chain app→ grammatical fix from prior replacement- Both instances of
The precall scanner→ functional description (same label each time)
TLDR: If your background process returns nothing and throws no error, you have no idea if it worked. Make noise the default — before you do anything else.
the dashboard that lied
We were deep into an ecommerce business's supply-chain app — inventory, forecasting, webinar scheduling, all in one Supabase-backed Next.js project.
The board had five webinars scheduled.
The dashboard said "Upcoming Webinars: 0."
Looked perfectly fine.
That's the trap. Not a red error. Not an exception. Not a flash of wrong. Just... a zero. A HEALTHY-looking zero that nobody questioned.
what I chased first
My first instinct was the query logic. Wrong table? Stale data? Date filter too aggressive?
I spent a good 20 minutes on root-cause theories while the actual problem sat there invisible.
Turned out a PostgREST (Postgres's auto-generated REST API layer) schema-cache reload — triggered by a completely unrelated migration — had started enforcing a foreign-key ambiguity on webinars→topics that had been silently auto-resolving before. PGRST201 was firing on every request.
But this line was swallowing it whole:
const [{ data: upcomingWebinars }] = await Promise.all([...])
It ignores .error entirely. No throw. No log. Just an empty array and a zero on the screen.
what made it systemic
Once I stopped guessing and started looking for the pattern, I saw it everywhere.
- ZAI (my LLM sub-agent for code delegation): returning HTTP 200 with
stop_reason: "model_context_window_exceeded"and an emptycontent[0].text. Looked like a response. Was nothing. - Supabase RLS (row-level security, the access-control layer on the database):
supabase-jsdoesn't throw on a denied read — it returns{ data: null, error: null }. Completely indistinguishable from "record not found." - My autonomous calendar prep agent (the 6am script that preps calendar briefings before I wake up): dying inside launchd (macOS's background job runner) and writing the real error to a log file nobody was watching. No alert. No TTS. No iMessage. Just silence that looked exactly like success.
Every single one of these had a "success" face on.
the fix that worked
My own notes from that sprint are blunt about it: "Never propose root-cause fixes on top of hidden failures — you'll be guessing."
That's the real lesson. Restore visibility first. Then diagnose from real error data.
For shell commands and background scripts, this is the pattern I use everywhere now:
LOG=/tmp/scanner-$(date +%Y%m%d-%H%M%S).log
<command> 2>"$LOG"
echo "EXIT=$?"
tail -c 4000 "$LOG"
Exit code. Full stderr log path. Inline tail for quick scan. Three things you can actually use — without re-running anything.
For delegate calls to ZAI or MiniMax (my two LLM sub-agents), the post-call checklist:
- HTTP 200 ✓ — necessary, not sufficient
stop_reason: end_turn— notmodel_context_window_exceeded, notmax_tokenstruncationcontent[].textis non-empty — actually check it, don't assume- Files actually changed (if code was supposed to be written, confirm the diff)
And for Supabase — always destructure { data, error } and check both. error: null with data: null is not a success. It's a question you forgot to ask.
why this matters to me
I'm building systems that run while I sleep. My autonomous calendar prep agent fires at 6am so everything is ready when I wake up. The supply-chain dashboard gets checked by people who trust it.
Silent failure is worse than loud failure because it looks like success. A loud failure gets fixed. A silent failure gets shipped.
You don't debug what looks fine. You build on it. And then six migrations later you're staring at "0 upcoming webinars," wondering what went wrong — when the error was there the whole time, routed neatly to /dev/null.
Make it loud first. Then make it right.