Optimizing Pinned Cluster Content for Fastest Download Speed

shawnp0wers · November 23, 2020, 10:25pm

Hello all – my team is integrating js-ipfs into an app which downloads large files (from a couple GB to 30GB or so) from our IPFS nodes, which are all connected via ipfs-cluster. The files are all pinned and synced, but when we download the CID (top of the merkle tree?) it usually only downloads from a single node.

Is there a way we should be preparing the files apart from the standard “ifps add” method? Does an alternate chunking method allow for bitswap to more efficiently find more peers? We have nodes in several geographical areas, and would like to get some swarming of downloads for speed, but also for redundancy if a node goes down. Since the nodes are all connected and in a cluster together, I assume the client “sees” them all.

Thanks for any insight, even if it’s just “RTFM”, I just haven’t found any information on helping with such things.

Thanks!

hector · November 24, 2020, 10:34am

Hey,

how are you downloading? You say that you are using js-ipfs in an app, IPFS nodes (go-ipfs?) and cluster. Are you running a js-ipfs full ipfs node that connects to those in the cluster?

It may be that js-ipfs bitswap implementation is as optimized as the go-ipfs one, therefore not taking advantange of the optimization that were introduced. It is also important that the getter node is directly connected to all the providers.

Does an alternate chunking method allow for bitswap to more efficiently find more peers? …

You could play with the chunk size (increase to reduce bitswap overhead), but the layout should already be ok for big files like yours (it will result in lots of leafs that can potentially be fetched in parallel).

shawnp0wers · November 24, 2020, 2:19pm

Thanks for the reply, Hector,

We’re downloading using js-ipfs in an app. (Although, we do get the same sort of performance with the go-ipfs app, with the download occasionally multiplexing, but not often)

Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?

Thanks again,
-Shawn

hector · November 24, 2020, 5:52pm

How do you check if things are multiplexing?

Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?

You should definitely ensure that your downloaded is connected to all of the other nodes for the length of its lifetime. go-ipfs has peering config for this. I’m not sure about js-ipfs. Can you check if it does work better with go-ipfs in that case? Honestly, I would expect go-ipfs to work much faster than js-ipfs in terms of bit-swap, particularly in the latest version (I don’t know if the improvements were ported though).

shawnp0wers · November 24, 2020, 9:31pm

We literally watch outgoing bandwidth from our peers. Sometimes (rarely) it pulls from two, but usually it randomly picks a node and downloads the entire file from there.

I’ll look at peering. I’m not sure if it can be done with js-ipfs or not, but I can check the go nodes. We bootstrapped them with each other’s info, but perhaps that different than peering.

reload · November 25, 2020, 10:16am

js-ipfs doesn’t have “peering” i think, go-ipfs implemented it fairly recently. Also, peering is different from bootstrapping, and you only need to pass a PeerID (the DHT is queried if you don’t pass a multiaddr). Not sure if that would solve the non-multiplexing (you shouldn’t have to do any specific config for that …)

meowdada · December 16, 2020, 8:57am

If I understand it correctly, IPFS can boost download speed by avoid downloading duplicated blocks. It implies that a file with high dedup ratio can benefit a lot from this. However, IPFS so far doesn’t seems to support parallel download from multiple peers like BitTorrent.

Following links are the go codes related to ipfs get <path>. You might have interest to check them out.

IPFS CLI get command: https://github.com/ipfs/go-ipfs/blob/master/core/commands/get.go
go-Unixfs: https://github.com/ipfs/go-ipfs/blob/master/core/coreapi/unixfs.go
NewUnixfsFile: https://github.com/ipfs/go-unixfs/blob/master/file/unixfile.go
DagReader: https://github.com/ipfs/go-unixfs/blob/master/io/dagreader.go
ipld.Walker: https://github.com/ipfs/go-ipld-format/blob/master/walker.go

Topic		Replies	Views
How to speed up file retrieval from js-ipfs node js-ipfs js-ipfs	5	1302	November 7, 2020
Fastest way to pull a directory from IPFS Help go-ipfs	1	264	March 21, 2022
Fastest way to get CID directory Help go-ipfs	4	435	October 20, 2021
Big file transferring problem Kubo go-ipfs	7	1282	September 13, 2018
A Question about the algorithm of BitSwap	2	536	October 24, 2018

Optimizing Pinned Cluster Content for Fastest Download Speed

Related Topics