How to improve retrieval time for public IPFS cids

I’ve been tweaking with IPFS settings to improve its content retrieval times for certain public IPFS CIDs when they’re freshly created by other parties. On occasions, it takes up to a minute for a very capable server with IPFS installed to fetch the contents of a single CID (that isn’t a dag) via its own API (127.0.0.1:5001/api…). I’ve found the API to work the fastest as compared to CLI.

In that sense, I’m wondering what is the best approach (if there is such) to improve the retrieval times, including peering, multiaddrs, or anything that can help. Also something I considered, was trying to dynamically find info on the hosting node of a file so that I can attach that to my ipfs POST requests as host_nodes. Not sure if this is possible.

Thank you!

I’m not enough of an expert to explain the difference between swarm connect and bootstrap adding, but make sure you know about these:

I have a feeling if your peers know about each other in that way, it might speed up access from any peer to any other in the group… if this is even an option for you to make your own peers know about each other.

1 Like

@wclayf thanks for the response. Swarm connect would help a lot if I knew the peer node hosting the file I’m looking for. Is there a way to fetch the principal node (or any other node having a file I’m interested in)? I couldn’t find a way to do this.

As for Bootstrap Adding, I did add all the high-traffic, public IPFS nodes specified in the documentation to my config file. I didn’t notice a difference in get speeds. In this line of thought, do you know if it’s better fetching a newly created file (not within my swarm, but rather from a public node) using my own IPFS Node or via calling the main ipfs.io/ipfs node (or even a centralized IPFS service provider such as Infura)?

Thank you kindly!!

If you don’t have any advanced knowledge about what peer might be holding a pin on some arbitrary CID, I’d say the best performance would be to just run your own gateway, and let the DHT do it’s thing. You can then guarantee that at least you know your own gateway isn’t overloaded (as ipfs.io theoretically can be).

However, it may be the case that you can find some pinning service that can provide better performance than your own gateway. It’s hard to predict because too many dynamic factors are involved.

I wonder if IPFS has something like “tracert” which can analyze the steps a gateway takes during retrieval, to help identify bottlenecks? I don’t know.

If you’re concerned about why some of your content is not loading as quickly as you’d think it should I would check on http://ipfs-check.on.fleek.co/ to see whether your data passes the basic smoke tests.

In general when someone runs into a problem where IPFS is slow at retrieval it is one of the following problems:

  • The data has not (yet) been advertised in the DHT
  • The data is only stored on nodes that are behind a NAT or firewall

Neither of these are the problem of the data retrieving node, but instead of the node providing the data.

If you’re running into these problems the ways I’d currently recommend to resolve them are: