TLDR: Make.com fires an email on every scenario error. The right signal isn't error count — it's failure mode. Classify first, route second.

The Setup

I run a lot of Make.com scenarios, the no-code automation platform.

And Make.com is enthusiastic about error emails.

Every time a scenario fails — even a transient HTTP 500 that auto-retries two seconds later and recovers fine — it fires an email from noreply@us1.make.com.

Within a week of running Apollo (my home AI assistant + scanning system), my inbox looked like a smoke detector that won't stop beeping.

The Wrong Instinct

My first instinct was to triage by volume.

Count errors per scenario, rank them descending, fix the noisy ones first.

This is the natural read. More errors = more broken, right?

Wrong.

A transient 500 might error six times in a morning and fully recover on its own every single time. A scenario that has genuinely stopped — not retrying, not recovering, just dead — might show up exactly once.

Sort by count and you spend your Friday debugging the loud thing that was never broken, while the actually-stopped scenario sits quietly at the bottom of the list.

What Actually Works

Triage by failure mode, not by volume.

The scanner I built now classifies every Make.com error email into one of four buckets before deciding what to do:

  • Transient — 500s, timeouts, things that auto-retry. Drop them. They're not asking for anything.
  • Fixable — auth expiry, a missing webhook, a config drift I can actually address. Highlight, with the scenario edit URL preserved so I can click straight into it.
  • Unfixable / stopped — the scenario is halted and isn't coming back on its own. Surface loudly.
  • Unknown — one retry as a cheap probe. If it fails again, escalate.

That last detail — always preserve the scenario edit URL in the alert — sounds like a small thing. It's not. If triage fires without the direct edit link, I have to go find the scenario in Make's dashboard myself. Friction kills follow-through.

The Restraint Lesson

When I later tried improving the triage judgment with a bigger, more capable model, it flagged more things, not better things.

GREAT lesson: triage rewards RESTRAINT, not horsepower.

Recall is easy — anything can over-flag everything and catch 100% of real issues. Precision is the differentiator. The job of a triage system is to leave you with a short list you'll actually act on.

A system that surfaces fifteen things a day, only three of which matter, trains you to ignore it.

A system that surfaces two things — both of which genuinely need you — earns trust.

Why This Matters to Me

I've built enough notification systems to know they die by false positives.

The moment I start skimming the subject line and hitting archive without reading, the system has failed. Doesn't matter how clever the underlying automation is.

Classifying by failure mode before deciding what to surface is what keeps that contract intact. The email count doesn't tell you what to do. The failure mode does.

Build the classifier, not the counter.