The Metric Was Wrong Because the Baseline Was Wrong

Good — the advisor confirms the plan. The only two PII tokens are the company name and the person's name. The 2.4× ratios stay (they're benchmark ratios, not money figures). Writing the cleaned post now.

TLDR: Our watch-through metric looked terrible — until I realized we were measuring against the wrong thing entirely.

The Setup

I've been building a webinar analytics dashboard for a health education business I work with.

Real webinar data, pulled from AEvent (our webinar platform), Shopify (our e-commerce backend), and a few other sources — all wired up into one place.

The whole point was accuracy.

The Wall We Hit

One of the core metrics I cared about was pitch retention — how many attendees actually stayed through the sales pitch?

The pitch doesn't start until 45 minutes in.

And our webinars run long. We're talking 2 hours and 15 minutes of video runtime.

So the dashboard was measuring watch-through against that full 2h15m runtime.

The number looked brutal.

I thought we had an engagement problem. I thought people were bailing early. I was ready to overhaul the whole content strategy.

What Was Actually Broken

It wasn't the content.

It was the baseline.

Measuring "did they stay for the pitch?" against a 2h15m runtime is like grading a relay race by whether everyone finished the cool-down lap.

The auto-detected content start was also wrong — it was off due to a timezone offset in the viewer timeline labels. So the anchor itself was drifting.

Two baseline errors, stacked.

The Fix

We recalibrated everything to the pitch window — the period that actually matters for sales.

Auto-detect the true content start from the viewer timeline labels (around 7:04 PM) instead of trusting the raw timestamp.

Then measure watch-through against pitch time only, not total duration.

The number we got back?

97% of the people that joined the webinar actually heard the pitch.

Same data. Different baseline. Completely different story.

Why This Matters

Here's the thing — that grading rubric I built? AI helped generate it overnight (I literally ran Claude Code, my AI coding assistant, in a loop while I slept and woke up to a full dashboard with letter grades).

And when I showed it to the team, I had to admit: "I haven't tweaked this rubric at all."

The AI gave me a reasonable structure. But it guessed at the thresholds. It didn't know our numbers.

Same lesson came up in the team call. Someone mentioned an industry stat — replay drives 2.4x the unique sales of a live event. Our events lead didn't flinch: "A 2.4 is probably a little high. We definitely see around two times."

She just knew. Because she had the actual history.

That's the move.

A metric is only as honest as what you measure it against. Your tool can build the chart. Your AI can build the rubric. But you have to supply the right baseline — or the whole thing lies to you.

Get the denominator right. Then the numerator tells you something real.

P.S. This fix also unlocked the insight that our post-webinar follow-up sequence may be doing more revenue work than the live event. That only became visible once the live-event numbers stopped being artificially deflated by a bad baseline.