Customizing sharding & replication in ipfs-cluster

There’s some great progress going on at https://github.com/ipfs/ipfs-cluster/ to help pin and persist needful info on IPFS. So first, kudos to the team :rocket:

I am trying to understand the project deep in the weeds & have 2 roadblocks understanding this project:
1. Is the cluster fault tolerant in rendering the files?
Suppose I add a file(say X) to IPFS and pin it to the cluster with a replication factor of 2. My understanding is that the file is pinned to 2 nodes(say N1 and N2) in the cluster of size M. Consider both the nodes N1 and N2 have now crashed (byzantine or not) after some time. Would the file still be available per client request ?

2. Does the cluster support customisable sharding?
Can the same file X pinned in the cluster be divided into an S number of shards(say x1, x2, x3xS) in an overlapping manner across the M number of nodes in the cluster?

3. Can we target the replication and sharding to select a specific or set of nodes?
Suppose I am more confident in replicating a file to specific node(s) identified by its multiaddr(A1, A2 etc.) respectively. Is it possible (or part of the planned roadmap) to allow this decision making ability vest at the user who adds the file?

Would be helpful if somebody can help address these questions. Thanks in Advance.

1 Like

Hi,

Technically speaking, there are cluster peers N1 and N2 running along IPFS peers IPFS1 and IPFS2. The ipfs daemons are the ones providing the content, regardless of whether cluster peers are running or not. Of course, things won’t work if the IPFS daemons die and they are the only providers.

There is a bunch of code in Cluster to do exactly this, but we are missing support from IPFS to have “recursive-pinning-to-a-max-depth”. The idea is that if the file needs to be split among several daemons, IPFS should not try to recursively pin the whole thing as soon as it gets a node from the DAG. Some discussions have happened and can be followed here: https://github.com/ipfs/go-ipfs/issues/5133 but, even though it can be done with the current pinning system, IPFS team decided they want to refactor/rewrite the whole thing before adding support for this. Whenever this lands, Cluster could make use of it with relative ease as most of the code is ready.

We just merged the ability to provide a custom list of “allocations” to pin content to. This list is a preference list and takes priority over the allocations decided by cluster (by default in terms of free IPFS repository space). This will be ready in the next release.

1 Like

Hi there, it’s been a whole year now, any updates on the customisable sharding?

No, we are only marginally closer.