TL;DR: If your iMessage agent marks its own sends with a zero-width joiner (U+200D) and uses
str.isprintable()to walk decodedattributedBodyblobs — those two choices cancel each other out. The fix is a one-line whitelist. The lesson is deeper than that.
The Setup
I built a Python poller that reads ~/Library/Messages/chat.db (Apple's iMessage database) and lets Apollo — my AI agent — send and receive iMessages autonomously.
One design decision felt really clever at the time: prefix every outbound message Apollo sends with a zero-width joiner (U+200D, ZWJ). Invisible to any human reading it. And the poller could check for it and skip the message — because that row came from Apollo, not a real human. Clean, simple self-filtering.
I was proud of that.
The Wall
Apollo started looping.
Every message it sent, it replied to. Then replied to that reply. Then replied to THAT reply. I killed the process and sat there staring at my phone.
The incoming message thread looked completely normal. But internally, the bridge was losing its mind.
What I Tried (That Didn't Work)
On macOS Ventura+, message.text is often NULL. The real content lives in message.attributedBody — an NSArchiver typedstream blob (a legacy binary serialization format Apple has used for decades) that you have to decode yourself.
My decoder found the NSString marker in the blob, skipped the 5-byte typedstream header, read the length byte, then walked forward consuming UTF-8 characters until it hit a non-printable codepoint. Standard practice. Stop at the junk.
I patched the main decoder path. Still looped.
Turns out there's also a fallback path — when the precise header scan fails, the decoder scans for the longest printable run in the whole blob. That path had the exact same isprintable() check. Same bug, different branch. I patched one and left the other live.
The Fix That Actually Worked
Here's the thing about U+200D:
'\u200d'.isprintable() # → False
Python's str.isprintable() returns False for ZWJ. It's Unicode category Cf — a "Format" character, not a printable one. So every time the decoder hit the ZWJ prefix on Apollo's own messages, it broke immediately, returned an empty body, and the fallback grabbed the next printable run from the typedstream metadata — something like _kIMMessagePartAttributeName. The ZWJ was gone. The self-filter couldn't see it. Apollo replied to its own message. Again. And again.
The fix was two lines — explicitly whitelisting \u200d in the printable check, in both the header-scan path and the fallback path:
def is_body_char(c):
return c.isprintable() or c == '\u200d'
And a regression test: round-trip a ZWJ-prefixed payload through the full decoder and assert the ZWJ survives.
Why This Matters to Me
The marker only works if every layer that touches the data knows about it.
I designed the ZWJ self-filter, wired it into the poller, and shipped it — without thinking through whether the decoder below it would preserve the marker at all. It didn't. And it failed silently: no exception, no empty string, no obvious error. Just the fallback picking a plausible-looking string and carrying on like nothing was wrong.
If you're building any system where an invisible control character has semantic meaning — test that it survives the full decode pipeline. "Printable" filters eat format characters. That's what they're designed to do.
P.S. All the raw chat.db knowledge — the epoch offset, the WAL read-only mode, the
attributedBodyheader format — came from a Swift/SwiftUI iMessage triage client I built (a sibling project). Build the reference layer once; it pays dividends across every tool you build on top of the same data.