TLDR: FIFO lot depletion + cron retries = potential double-depletion. Fixed it with a DB-enforced partial unique index keyed to the snapshot date. Ran the repair runbook. Found zero rows to repair — because the lots table was empty. Best possible outcome.

the setup

We're building a supply chain dashboard for an ecommerce business — Shopify + Recharge (subscription billing, their bread and butter) orders syncing into Supabase, with inventory tracked across lots (a lot is a physical batch of product with its own cost basis).

FIFO depletion means oldest lot gets consumed first as orders ship.

The cron job that syncs orders also runs daily inventory snapshots that deplete those lots.

I also wired in bounded retry logic for Shopify and Recharge API calls — max 5 attempts, jitter on 429s and 5xx responses. Right call. Those upstream APIs are flaky.

But it opened a gap.

the risk I found

If a snapshot run fails halfway through, retries, and starts depleting lots again from the same snapshot date… it depletes them twice.

First run: take 100 units from Lot A.

Retry: doesn't know Lot A was already hit. Takes 100 more.

Phantom negative inventory. And the silent kind — no error thrown, just wrong numbers accumulating while nobody looks.

migration 042: let the database say no

The fix was a DB-enforced idempotency key.

Migration 042 added a snapshot_depletion row source with a partial unique index keyed on source_ref_id = YYYYMMDD — the snapshot date.

Every depletion write for that date either succeeds once, or Postgres rejects it on any subsequent attempt.

Not deduped in application code. Not a SELECT-before-INSERT (a classic race condition waiting to happen). Rejected at the constraint level. The database is the authority.

The pattern worth remembering:

  • Make the idempotency key deterministic — derived from the business event (the date), not a UUID minted at write time
  • Pin the uniqueness in a partial index so retries become no-ops automatically
  • Trust the DB, not your application logic

the repair runbook I didn't need

After shipping 042, I wrote a data-repair runbook. Steps 1 through 5. Assumed the double-depletion might have already run before the fix landed and that production data could be dirty.

On June 8th I ran Step 1 — read-only diagnosis, exactly as the runbook specified.

inventory_lots = 0 rows. lot_transactions = 0 rows.

The ops person at an ecommerce business who actually enters the inventory lots into the system hadn't entered any yet.

The FIFO subsystem was built, migration 042 was live, the bug was fixed… and there was nothing to corrupt. The runbook had nothing to repair.

why that's actually the best outcome

A bug you fix before the data arrives is worth ten you fix after.

The idempotency constraint is in the schema now. When the ops person enters the first real lot — next week, next month, whenever — any cron retry that would've silently double-depleted it will just bounce. Postgres rejects it. No alarm, no runbook, no late-night data investigation.

Build idempotency in at schema time. Key it to something deterministic. Let the database enforce it.

And if you write a repair runbook: run the diagnosis before you run anything else. The runbook's premise might already be wrong — and in this case, being wrong was the best news I could get.