Any best practices for canonicalization of URLs that reference untrusted IPFS gateways?

We deal with a lot of blockchain data that references metadata held in IPFS.

There is published guidance for how IPFS URI references should be made on blockchains; but people don’t necessarily follow it.

So we have to deal with cases where these metadata URIs aren’t ipfs:// URIs, but rather are references to IPFS gateways, and more specifically, are references to arbitrary third-party IPFS gateways, rather than to the canonical ipfs.io / dweb.link hostnames.

Users permanently cache the URLs we return, so we want to give them URLs that they will still be able to fetch from years later. But we have experienced flakiness with these arbitrary third-party gateways; and there is also no guarantee that they will be “permanent” in the way that IPFS references should be. In short, we don’t trust them.

So, currently, when we recognize that a metadata URL is a reference to one of these third-party IPFS gateways, we rewrite/canonicalize the URL — either into an ipfs:// reference (for long persistence), or into a URL pointing to a trusted IPFS gateway, e.g. ipfs.io (for immediate fetch.)

Two questions:

  • Is recognizing+rewriting third-party IPFS gateways in URLs like this a best practice, or should we be leaving them alone?

  • Is there a simple probe that can be used to determine whether an arbitrary HTTP host is an IPFS gateway? Does the IPFS Companion browser extension use logic like this, or does it have a manually-curated list of recognized IPFS gateway domains?

I think it is a good practice. go-ipfs gateways would set some headers like x-ipfs-pop or x-ipfs-path on responses.

IPFS Companion uses is-ipfs - npm + DNSLink lookups to detect IPFS resources, so bit different use case than what you look for.

You can test if a host is a Gateway by asking it for some inlined plain text (to avoid content routing delays):

$ echo -n some-string | ipfs add --cid-version 1 -q --inline
bafkqac3tn5wwklltorzgs3th

Then do HTTP GET for https://example.com/ipfs/bafkqac3tn5wwklltorzgs3th and see if you got some-string back. This way you not only confirmed the gateway is online, but that it is a real one (capable of decoding CIDs).

If you are looking for a passive way, Gateway responses have X-Ipfs-Path header with the content path:

$ curl -Is https://dweb.link/ipfs/bafkqac3tn5wwklltorzgs3th | grep -i x-ipfs-path
x-ipfs-path: /ipfs/bafkqac3tn5wwklltorzgs3th