The One SQL Comment That Broke Every BWH Report — and Why a Human Had to Fix It

Good — the advisor confirms my analysis and surfaces two refinements: treat both SKU brand-prefixes consistently, and remove the surviving "Labs" fragment throughout. Applying now.

TLDR: A one-line migration comment asserted "no SKU overlap between stores." It was wrong. Every velocity, forecast, and ledger report in the supply chain app silently doubled two products. Thirty seconds of SQL would have caught it. The fix needed a human — because the machine can't decide which product is canonical.

The Setup

We're building a supply chain operations app for an ecommerce business (a supplement brand) — inventory tracking, burn-down forecasts, velocity charts, the works.

The client actually runs two Shopify storefronts: the main store and a second storefront (a separate entity selling the same retail products through its own storefront).

When I added multi-store support in migration 022, I was confident the two stores had separate product catalogs.

I wrote the comment right at the top of the file:

"No SKU overlap between stores, so each product row stays unique per (store, shopify_id)."

And I shipped it at 00:28 EDT.

The Wall

Next morning, my client (the stakeholder reviewing the app) sits down to look at the reports.

The Velocity table shows two rows for BRAND-BGC — Beta Glucan Replenishment Cream — with no store label. One trending -89%. The other +500%.

Completely baffling if you don't know what caused it.

Same thing with BRAND2-KIDS-60 (Kids Gummies). Two identical-looking rows. Two wildly different numbers.

The report SQL was doing GROUP BY COALESCE(canonical_product_id, p.id) — perfectly reasonable, assuming each SKU maps to exactly one product row. When that assumption breaks, every report table quietly renders a second unlabeled ghost row. Velocity, Ledger, Forecast, L10, History — all of them.

Two client-blockers, first review, morning after launch.

What I Got Wrong

The facts were easy to verify. Thirty seconds of SQL:

SELECT sku, COUNT(*)
FROM products
GROUP BY sku
HAVING COUNT(*) > 1;

Would have caught BRAND-BGC and BRAND2-KIDS-60 before I declared the multi-store work done.

I didn't run it. I trusted the comment.

Lesson burned in: SQL assertions in migration comments are not tests. Run the query. Every time.

What We Built — and Why a Human Had to Apply It

The products table already had a canonical_product_id self-FK from earlier dedup work. The fix path was clear: point the second storefront's duplicate rows at their main-store canonical counterparts. Reports auto-dedupe through the existing GROUP BY.

The tempting move? Auto-merge by SKU.

That would have been wrong.

The machine has no idea which of the two BRAND-BGC rows is canonical. Maybe the second storefront has fresher data. Maybe the main-store row is the one POs get linked to. Maybe there's a reason both need to stay visible. Collapsing them programmatically is a business decision, not a data-layer decision.

So instead, I built an admin affordance on /admin/products: a "Split across N rows" badge that surfaces any SKU collision, with a "Use this as canonical" button next to each dupe.

My client and I reviewed together. She confirmed which rows were the true canonical. I applied the four UPDATE statements via SQL — Cream (second storefront → main store), Gummies (second storefront → main store), and two book-product duplicates.

The numbers snapped into shape immediately.

Forecast row count: 15 → 13. Cream stockout: was "92 days" or "823 days" depending which dup you were looking at → 131 days, single row.

Why This Matters to Me

The supply chain app is supposed to help my client trust her data — to actually make decisions from it, instead of second-guessing every number.

Two rows for the same product, no label, no explanation — that kills trust instantly. And it happened because a comment told us something was safe that we never actually verified.

The pattern I took away: surface ambiguity in the UI, let the human resolve it, let the system propagate the decision. That's what human-in-the-loop actually means in a data quality context. Not a chatbot confirmation dialog — a purpose-built affordance that makes the right choice easy and the wrong choice visible.

(P.S. — same alias pattern bit us again, later, on inventory counts. Alias rows were triple-counting physical stock via SUM(quantity) GROUP BY canonical_id. BGC on-hand was reading 1,737. True count: 579. Thirty-day runway looked like "soon"; was actually "critical." Same class of bug. Same class of fix.)