Unable to bootstrap clusters on docker with raft consensus

Hey IPFS community!

So I’m working on a project for my master thesis and I need to make several test cases with clusters and understand which consensus fits better on the project.

I’m trying to run several clusters using docker, I’m running the docker-compose file provided on documentation [https://raw.githubusercontent.com/ipfs/ipfs-cluster/master/docker-compose.yml], and I understand that they discover themselfs using mDNS and I dont need to bootstrap on start up neither use a secret key. Everything works with crdt has expected.

To try the raft consensus I use the same docker-compose config except I remove from de env vars the CLUSTER_CRDT_TRUSTEDPEERS. Then changed on the service.json of every cluster the secret key with the same one. On the cluster0 I have peer_addresses: [ ] and

“consensus”: {
“raft”:{
“init_peerset”: [ ]
}
},

On the other clusters I pasted on init_peerset the IPFS hash of the cluster0 and on peer_addresses with the cluster0 address /ip4/127.0.0.1/tcp/9096/p2p/12D3KooWA9gGWiQo3Ng8ZuoeyFDCsJisBLYwsdSZ8EJr9sWF7Nvg

Running the docker-compose I get the following output:

cluster2 | 18:43:41.420 INFO raft: Current Raft Leader: 12D3KooWEb5WfKjzh53Xyoz7Kj1CwUDyEY8N8AhsqFnnX7h89nE6

cluster1 | 18:43:41.881 INFO raft: Current Raft Leader: 12D3KooWAr1PS8D9Dj1568qsaF2N9yriRk5rj1k62zFr5pMrGpeV

cluster0 | 18:43:41.906 INFO raft: Current Raft Leader: 12D3KooWA9gGWiQo3Ng8ZuoeyFDCsJisBLYwsdSZ8EJr9sWF7Nvg

They are not seeing each other, they considere themselfs the raft leaders. Any suggestion on what am I missing and how can I bootstrap correctly the clusters?

Thank you

Hi!

For this approach to work:

  1. init_peerset needs to have the peer IDs of all the peers in the cluster.
  2. You need to start the cluster without any state (so, peers with just the configuration that have not been started before)
  3. Peer addresses should have the peer addresses for all the cluster peers.

This means that the service.json for all peers is the same. The init_peerset means they do not need to bootstrap.

The recommended approach (as documented in https://cluster.ipfs.io/documentation/deployment/bootstrap/) is using the --bootstrap flag to launch (it is recommended because it is simpler to explain and more difficult to fuck up). If you go back in time with the docker-compose.yml, you get into a version that used Raft this way:

Look at this command hack to make new peers without state bootstrap to cluster0: https://github.com/ipfs/ipfs-cluster/blob/063c5f1b783d05897e73a9ac1ecf9682ad9c605c/docker-compose.yml#L83. Subsequent starts do not need to bootstrap anymore.

I don’t think this docker compose works right away with current version (at the very least you will need to add IPFS_CLUSTER_CONSENSUS=raft), but it should be easy to get it working.

Let us know how it goes.

Hello, sorry for the delayed reply.

I tried that version of docker-compose and unless there is already a compose file it’ll run into multiple errors and does not boot the cluster. After making it work it runs into the exact same problem as described above. They identify themselves as the raft leaders. But that is normal since they are starting with state in order to make them boot.

unless there is already a compose file it’ll run into multiple errors

I am not sure what you mean. What errors?

After making it work it runs into the exact same problem

Pretty sure the problem is how you made it work. Non-leader peers need to bootstrap to other peers or configure init_peerset as I described. They should not identify as leader. Either there is a single leader that was started first and other peers bootstrap to it, or all the peers have an init_peerset and are started at the same time and then proceed to elect a leader. As mentioned, very important that they start with a clean raft state in this case.

I’m so sorry for the lack of infomation. I basically used a hammer to make it work ¯_(ツ)_/¯.

For cluster0 I have those errors and for cluster1 it has anothers. So what I did was to delete te compose folder, run the docker-compose file in the documentation to create a new one -> Then ran again the file you sended and it worked.

Next part was doing what you told. I added to the env vars: CLUSTER_CONSENSUS: raft to both clusters -> Added both address to the peer_addresses and both multihashes to the init_peerset -> Then used the same service.json for both.

For what I understood since I did not started a raft cluster with no state first to bootstrap around it, starting both at the same time with the same config will elect one has the leader. Although I get the following error ERROR raftlib: Failed to make RequestVote RPC to {Voter QmTU9pHkw6WwySfae81iJSmPygEpNyt3y3pio57FrtJdzy QmTU9pHkw6WwySfae81iJSmPygEpNyt3y3pio57FrtJdzy}: routing: not found logging.go:72

That error would mean that one of the peers does not know where to contact the other. Check that peer_addresses is correct? (make sure you delete compose folder every time you restart until things work). Otherwise please send full logs, as there might be more relevant errors before?

I’ve used the following address, which is the one ipfs-cluster-ctl peers ls gives: /ip4/127.0.0.1/tcp/9096/p2p/12D3KooWSAn3WXa2s6ZD9RvX6gHjhDKzUdsB4EUM7bNnLUhpEAZa
I even replaced p2p to ipfs. Plus the address of the other cluster

Also deleting the compose folder runs into the errors I showed in the print screen, and I noticed that the cluster1 does not generate any files. There are no more error mesages apart from all I showed.
I guess I might try to use the 3 pc’s I have at home and use each one as a cluster

That’s a localhost address. I don’t think you are running several peers on the same host, so this is not that peer address. You need to add the address in which that peer is reachable in the network it shares with others and you need to make sure it is accessible.

Honestly, the compose example works out of the box, and worked out of the box when Raft was used. You need to take the new docker-compose and port the bootstrap thing from the old. Your log indicates that your ipfs address is not correct and probably your API is not listening on 0.0.0.0. Have you kept the environment part as in https://github.com/ipfs/ipfs-cluster/blob/master/docker-compose.yml#L50 ?