TLDR: When an AI agent has side effects — creating tasks, sending messages, triggering anything — a prompt saying "don't do X" is not a hard stop. You need a code-level gate too. Learned this the fun way.
The Setup
Apollo (my personal AI operating system) runs a background scanner that checks whether my clients' integrations are still authenticated — Gmail, Slack, Google Calendar, Signal. When something needs attention, it creates a task in Things 3, my task manager.
Each client has a different stack. An ecommerce business has Gmail, Slack, Calendar. A Signal-only client? Signal only.
Mostly this works exactly as designed.
The Wall
Then I saw it in Things: slack_[client]: needs_reauth.
I stared at that for a second.
That client doesn't have Slack.
Not "hasn't connected it yet." Doesn't. Use. Slack. Full stop.
But Sonnet (the Claude model powering my scanner) had gone exploring — found a slack_[client] key somewhere in the auth status output, called it a failure, and my code dutifully spun up a reauth task for a channel that does not exist.
And here's the beautiful, terrible part: left unchecked, this loops. Scanner runs. Phantom auth error fires. Task created. I clear the task. Scanner runs again. Phantom fires again...
INFINITE. REAUTH. LOOP. For a service that was never connected.
What I Tried First
My first move was the obvious one: fix the prompt.
I added a ## OUT OF SCOPE — NEVER ATTEMPT section to collector.py and spelled it out explicitly:
- A Signal-only client → Signal only. No Slack, Gmail, Calendar, Notion, Drive.
- Notion → an ecommerce business-only.
- A law firm client uses Google Chat, another client uses Microsoft Teams — neither accessible via this MCP (the tool-connection layer my scanner uses).
And honestly? That helps. The model mostly respects it.
But "mostly" is not a guarantee when the output of a model is what decides whether a real side effect fires. I've trusted prompts too many times and regretted it.
The Fix That Actually Worked
I added a hard IN_SCOPE allowlist in main.py.
IN_SCOPE = {
"gmail_CLIENT_A", "gmail_CLIENT_B", "gmail_CLIENT_C", "gmail_CLIENT_D",
"slack_CLIENT_A", "slack_CLIENT_B", "slack_CLIENT_C",
"calendar_CLIENT_A", "calendar_CLIENT_B", "calendar_CLIENT_C", "calendar_CLIENT_D",
"signal",
}
Before any auth_status key becomes a Things task, it has to clear that set. Anything not in IN_SCOPE gets logged as "Ignored out-of-scope auth keys" and goes absolutely nowhere.
Belt and suspenders.
The prompt tells the model what to look at. The code decides what's allowed to cause a side effect. Those are two separate jobs — and only one of them should be handed to a language model.
Why This Matters to Me
I'm building more and more systems where AI agents take action, not just generate text. Create a task. Trigger a webhook. Send a message.
The more consequential those actions are, the more I need to stop relying on "the prompt said not to" as my only gate.
Prompt instructions shape behavior under normal conditions. A code-level allowlist enforces hard limits when behavior drifts — and given enough time, it WILL drift.
If your agent has side effects, add the allowlist. It's twenty lines and it'll save you from a morning of phantom tasks.
P.S. The same commit also fixed an unescaped f-string brace bug in the prompt builder that was raising a
NameError. So yes — two ways the scanner was broken at once. Fun morning.