Is there any provision to choose ipfs-cluster peer for content replication?

Hello everyone,

I’m new to IPFS and IPFS-Cluster concepts. I have few questions in mind considering the content replication.

Q1. For content replication is there provision to choose IPFS_Cluster peers ?
e.g. Consider network with 5 IPFS-Cluster peers (P1, P2, P3, P4, P5) running, is there a way to select peer (content will only get replicated on P2, P3, P5) while uploading content to network ?

Q2. Is there a way to update replication factor based on file’s priority ?
e.g. File.txt will get replicated on 2 peers where as ImportantFile.txt will get replicated on 4 peers out of 5 total peers.

Any reading material/docs would also be appreciated…

Thanks in advance…

Yes, see --allocations and --replication options:

$ ipfs-cluster-ctl pin add --help
NAME:
   ipfs-cluster-ctl pin add - Pin an item in the cluster

USAGE:
   ipfs-cluster-ctl pin add [command options] <CID|Path>

DESCRIPTION:
   
This command tells IPFS Cluster to start managing a CID. Depending on
the pinning strategy, this will trigger IPFS pin requests. The CID will
become part of the Cluster's state and will tracked from this point.

When the request has succeeded, the command returns the status of the CID
in the cluster and should be part of the list offered by "pin ls".

An optional replication factor can be provided: -1 means "pin everywhere"
and 0 means use cluster's default setting (i.e., replication factor set in
config). Positive values indicate how many peers should pin this content.

An optional allocations argument can be provided, allocations should be a
comma-separated list of peer IDs on which we want to pin. Peers in allocations
are prioritized over automatically-determined ones, but replication factors
would still be respected.


OPTIONS:
   --replication value, -r value          Sets a custom replication factor (overrides -rmax and -rmin) (default: 0)
   --replication-min value, --rmin value  Sets the minimum replication factor for this pin (default: 0)
   --replication-max value, --rmax value  Sets the maximum replication factor for this pin (default: 0)
   --allocations value, --allocs value    Optional comma-separated list of peer IDs
   --name value, -n value                 Sets a name for this pin
   --mode value                           Select a way to pin: recursive or direct (default: "recursive")
   --expire-in value                      Duration after which pin should be unpinned automatically
   --metadata value                       Pin metadata: key=value. Can be added multiple times
   --no-status, --ns                      Prevents fetching pin status after pinning (faster, quieter)
   --wait, -w                             Wait for all nodes to report a status of pinned before returning
   --wait-timeout value, --wt value       How long to --wait (in seconds), default is indefinitely (default: 0s)

Thank you so much @hector for the information.

Where should I look for detailed documentation of Rest API provided for Cluster peer ?

Basic idea is to see how to override config (replication factor, allocation) for individual HTTP POST /add requests.

I’m currently referring to node package ipfs-cluster-api.

Correction
Here one can find documentation (Javascript based) of HTTP methods supported by IPFS-CLUSTER

Thanks in advance…

Information on how to figure out the REST API is here https://cluster.ipfs.io/documentation/reference/api/

1 Like

Hi @hector ,

I have done the private network setup with 3 ipfs node and 3 ipfs-cluster peers.
I have set the replication_factor_min and replication_factor_max to 0, and added file with --replication option and it worked perfectly.
But currently with --allocations option data is being allocated to all the peers, irrespective of peer ids provided in command.

ipfs-cluster-ctl add --allocations PEERID1, PEERID3 samplefile.txt
same with
ipfs-cluster-ctl pin add --allocations PEERID1, PEERID3 samplefile.txt

Could you please correct me here with right way of allocating the content.

Thank you so much.

Hi,

I think the problem is:

I have set the replication_factor_min and replication_factor_max to 0,

That is your default replication factor for pins for which you don’t specify anything. When set to 0 it is understood that you want cluster’s default so it is -1 (pin everywhere).

Then you are pinning something without specifying --rmin or --rmax manually, so it will get pinned everywhere because that is the configured setting as cluster sees it.

You need to pin like this ipfs-cluster-cl pin add --allocations "PEERID1,PEERID2" --replication 2 <CID>.

The given allocations will be used as preferential destinations: if your desired replication factor is lower, it would just use the first peers in that list. If it is higher, it will use the peers in the list and select the rest from the available pool.

Oh nice, understood now. Thank you so much. :+1:

I have 3 nodes connected in the cluster.
I did ipfs-cluster-clt pin add --allocations PEERID1 --replication 1

Now the node which had the file in local is offline. So, the third peer is trying to get the contents of the file using

ipfs cat . But the cursor in the terminal just hangs. The cid is pinned to PEERID1 so the other peer should be able to retrieve the content right?

Please let me know if I am missing something or doing wrong.

Thank you

Assuming PEERID1 is well connected to the network, can be dialed-in etc, then yes.

They all are privately connected through the swarm.key and when I run the nodes they show the PEERID1 that they are connected too. I have added the address in the peerstore.

But still it is not able to retrieve the contents after pinning to PEERID1 and when the peer which have the actual file goes offline.

ipfs-cluster-ctl status shows one CID is PINNING and another PIN_ERROR context deadline exceeded.

That’s the reason I am unable to extract data at the third peer

yeah well, that means the content is not available in any of the online-cluster peers if PEERID1 was still pinning.

Yes. Sure thank you.

Can I know if there is any reason why it is taking a lot of time to pin ? I was trying to add a file CID which is of 1kb in size.

I have tried --wait option too. But it was failing with error couldn’t get the pin status after a while.

How do I solve this issue.

Thank you.

Is the peer that is pinning connected to the peer that has the content?

Yes it does show that it is connected. It displays the connected cluster peer id on the node which has the content

Sorry, can you explain again your setup? Is the node that has the content part of the cluster? I’m confused about what is running and what is not running when you pin.

Sure.

Assumption all ports are open.

PEER 1, 2, 3 are the three nodes in the cluster.

Want to make PEER 1 as a central node in the cluster where every other node in the cluster will pin their file hashes to PEER 1.

Commands executed on three nodes are :

  • ipfs init --profile=badgerds
  • created a swarm.key and placed it in .ipfs in all the three nodes
  • ipfs-cluster-service init on all three nodes
  • Copied the cluster-secret key of one node and replaced it in other two nodes so as to be in the same network cluster
  • Added the PEER 2, 3 addresses in PEER 1 peerstore
  • Added the PEER 1 address in PEER 2 and PEER 3 in their respective peerstores
  • Then started ipfs daemon and ipfs-cluster-service daemon on all three machines
  • Saw the PEER 2, 3 peerid’s on PEER 1 command prompt after running ipfs-cluster-service daemon
  • Saw PEER 1 peerid on PEER 2, 3 command prompt after running ipfs-cluster-service daemon
  • Added a file on PEER 2 using : echo “Hello world” | ipfs files write --create --raw-leaves /test.txt
  • Copied the generated hash/CID of the file
  • Then ran ipfs-cluster-ctl pin add --wait --replication 1 --allocations <PEER 1 cluster id>
  • The status showed pinning and failed later with error msg saying context deadline exceeded.

Please let me know if I have to do wrong or if you didn’t get me.

I’m not sure if we have to add addresses in the peerstore of PEER 1.

Thank you.

I think you did things right wrt cluster. This looks like a problem with IPFS daemons not being connected.

What is ipfs swarm peers showing?

They are in a private network so they cannot rely on bootstrappers. You need to let them bootstrap to, for example, the ipfs daemon in peer one. Cluster attempts to connect ipfs daemons among peers, but this depends on what IPs they are reporting to have. Are peers on the same LAN? If not, you need to ensure that IPFS daemons are connected among themselves too.

As a small node, I would re-use the same peerstore file for all 3 cluster peers, containing all 3 addresses. You want Peer2 and Peer3 to be connected too to each other.

PEER1 and PEER2 are not on same LAN so as you mentioned I connected them through swarm connect.

Then I was able to pin the file on the PEER1. I could see the status as pinned.

I don’t want to connect any other nodes in the cluster.
The model looks like many to one (n-1 nodes in the cluster which are all connected to the 1 remaining node in the cluster).

When I did from PEER2
ipfs-cluster-ctl pin add --wait <file CID/hash>
The file cid was pinned on the PEER1. I could see the status as PINNED

But when I do
ipfs-cluster-ctl pin add --wait --replication 1 --allocations <file CID/hash>
The status shows as REMOTE.

Not sure on how to pin a cid on to a particular node/peer.

What is the value to be used for --allocations flag ? Is it the ipfs id? or ipfs-cluster-ctl id?
And is it sufficient if we give the id itself of the whole address like /ip4//tcp/9096/p2p/

Thank you.

It’s the cluster peer IDs.

Note if your replication factor is 1, and you already pinned in 1 peer, I think it won’t pin it somewhere else because the replication factor is satisfied. The allocations flag is a priority list of where things should be pinned, but it does not need to make that decision since things are already pinned. So if you want to move the pin from one allocation to other the best way is to unpin and then pin again.