TLDR: If your code fetches a URL that came out of a third-party response header, validate the hostname against an allowlist before making the request. Even if you fully trust the sender.

the setup

We've been building out an influencer CRM for an ecommerce business — a Next.js app backed by Supabase that syncs Shopify order data, webinar records, the works.

Shopify paginates via the RFC 5988 Link response header. The pattern looks like this:

Link: <https://your-store.myshopify.com/admin/api/2024-01/orders.json?cursor=eyJsYXN0...>; rel="next"

You parse the <url>; rel="next" out of that header, then fetch it to get the next page.

The official docs say it best: just follow the Link headers verbatim.

So that's exactly what we did.

the wall the auditor found

A comprehensive security audit turned up 22 findings — and we shipped fixes on all 22 the same day (yes, I'm proud of that).

SEC-5 was the one I hadn't fully thought through: SSRF (server-side request forgery — tricking your server into fetching internal addresses it shouldn't, like a cloud metadata endpoint at 169.254.169.254).

The auditor's logic: the URL in that Link header is data coming from a third-party response. Your server then turns around and makes a new HTTP request to wherever that URL points. If anything puts a malicious URL in that header — a compromised vendor, a misconfigured proxy, response-splitting, anything — your sync worker happily fetches http://localhost:9000/admin or the AWS metadata endpoint, because nobody checked.

It feels paranoid. Shopify isn't evil. But that's not really the point.

The principle that transfers: any URL your code extracts from a response and then fetches is untrusted input — regardless of how much you trust the sender. Defense in depth means you don't rely on Shopify staying perfect forever.

the fix that actually works

The tempting version is a substring check: "myshopify.com" in next_url. Don't do this. It's bypassable — myshopify.com.evil.com passes that check.

The real fix is to parse the URL and match the hostname exactly:

from urllib.parse import urlparse

SHOPIFY_HOSTS = {"your-store.myshopify.com"}

def safe_next_url(link_header: str) -> str | None:
    url = parse_link_next(link_header)
    if not url:
        return None
    host = urlparse(url).hostname
    if host not in SHOPIFY_HOSTS:
        raise ValueError(f"SSRF guard: unexpected host in Link header: {host}")
    return url

Two lines of actual protection. If the hostname doesn't match the allowlist exactly, we raise and log — we don't follow.

why this matters to me

The documentation told me to follow the header verbatim. And that's correct advice for correctness — the cursor is opaque, you're not supposed to construct it yourself. But correctness for pagination and safety for where you point your HTTP client are two different questions, and I was only asking the first one.

The lesson I'm carrying forward: every time a URL comes from outside your code and your code is about to fetch it, ask yourself — "do I know where this is allowed to point?" If the answer is yes, encode that. If the answer is "I trust the source," that's not an answer.

P.S. Recharge uses the exact same Link header cursor pattern. Went back and added the same allowlist there the same afternoon.