Ipfs-cluster: Unable to connect to other peers

I’m trying to create an ipfs cluster.
I got 3 machines running on the same network and I want to “clusterize” them.
Each machine is running a docker container ipfs-cluster.

I succeeded to connect “ipfs” peer by bootstrapping peer to each other.
But, I’m not able to connect “ipfs-cluster” peer.

When I attempt to connect a peer to an other, I always have an error message.

All service.json file have the same secret.

Command:

docker exec ipfs-node ipfs-cluster-ctl --debug peers add /ip4/<ip>/tcp/9096/ipfs/QmW9wAZtDG8EkiP2NkMEQUhRLCit1ocG2JSd4QwoDLQYUz

Logs:

14:36:13.137 DEBUG cluster-ct: POST: http://127.0.0.1:9094/peers main.go:465
An error ocurred:
  Code: 500
  Message: dial attempt failed: <peer.ID bRwyFj> --> <peer.ID W9wAZt> dial attempt failed: i/o timeout
14:36:23.143 DEBUG cluster-ct: Response body: {"code":500,"message":"dial attempt failed: \u003cpeer.ID bRwyFj\u003e --\u003e \u003cpeer.ID W9wAZt\u003e dial attempt failed: i/o timeout"}
 main.go:485


14:44:48.714 DEBUG cluster-ct: POST: http://127.0.0.1:9094/peers main.go:465
14:44:48.715 DEBUG cluster-ct: Response body: {"code":500,"message":"dial backoff"}
 main.go:485
An error ocurred:
  Code: 500
  Message: dial backoff


14:52:25.861 DEBUG cluster-ct: POST: http://127.0.0.1:9094/peers main.go:465
An error ocurred:
  Code: 500
  Message: dial attempt failed: <peer.ID bRwyFj> --> <peer.ID W9wAZt> dial attempt failed: EOF
14:52:35.863 DEBUG cluster-ct: Response body: {"code":500,"message":"dial attempt failed: \u003cpeer.ID bRwyFj\u003e --\u003e \u003cpeer.ID W9wAZt\u003e dial attempt failed: EOF"}
 main.go:485

machine1, service.json:

{
  "cluster": {
    "id": "QmRzjMdL8gLhq44vKnfLqMnVNhA5G1Etg26x5C7WgFsKLe",
    "private_key": "<private-key>",
    "secret": "9a42070d8e8e3016bfd5cb17e34220fdd4abb7c6e5ec0e3c2016c8545faf438d",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s",
      "basic_auth_credentials": null
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}

machine2, service.json:

{
  "cluster": {
    "id": "QmWoon78YVxzixMX3AG54mpZNuQsGYXjvTuj9XfsjVSDv9",
    "private_key": "<private-key>",
    "secret": "9a42070d8e8e3016bfd5cb17e34220fdd4abb7c6e5ec0e3c2016c8545faf438d",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s",
      "basic_auth_credentials": null
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}

machine1, docker exec ipfs-node ipfs-cluster-ctl id:

QmbRwyFjjKVatfZDRxNKUtnRoWqZp7Eh4gKG8q2eqdktqX | 0 peers
  > Addresses:
    - /ip4/127.0.0.1/tcp/9096/ipfs/QmbRwyFjjKVatfZDRxNKUtnRoWqZp7Eh4gKG8q2eqdktqX
    - /ip4/172.18.0.4/tcp/9096/ipfs/QmbRwyFjjKVatfZDRxNKUtnRoWqZp7Eh4gKG8q2eqdktqX
  > IPFS: QmNPbmXBhnJGJEFyuCZAHRc2t6vtZaPBKZcmJbHaRt5uWz
    - /ip4/127.0.0.1/tcp/4001/ipfs/QmNPbmXBhnJGJEFyuCZAHRc2t6vtZaPBKZcmJbHaRt5uWz
    - /ip4/172.18.0.4/tcp/4001/ipfs/QmNPbmXBhnJGJEFyuCZAHRc2t6vtZaPBKZcmJbHaRt5uWz

machine2, docker exec ipfs-node ipfs-cluster-ctl id:

QmW9wAZtDG8EkiP2NkMEQUhRLCit1ocG2JSd4QwoDLQYUz | 0 peers
  > Addresses:
    - /ip4/127.0.0.1/tcp/9096/ipfs/QmW9wAZtDG8EkiP2NkMEQUhRLCit1ocG2JSd4QwoDLQYUz
    - /ip4/172.18.0.4/tcp/9096/ipfs/QmW9wAZtDG8EkiP2NkMEQUhRLCit1ocG2JSd4QwoDLQYUz
  > IPFS: QmabDq9z9DYZQWpCdY7eNT1rxjjpJJQ77RwfLo6TcAggfD
    - /ip4/127.0.0.1/tcp/4001/ipfs/QmabDq9z9DYZQWpCdY7eNT1rxjjpJJQ77RwfLo6TcAggfD
    - /ip4/172.18.0.4/tcp/4001/ipfs/QmabDq9z9DYZQWpCdY7eNT1rxjjpJJQ77RwfLo6TcAggfD

What Im doing wrong ?

Your containers don’t seem to have connectivity between each other. Also, they seem to run on the same ip.

They have connectivity between each other because they are connected via IPFS:

machine1:

docker exec ipfs-node ipfs swarm peers
/ip4/<ip-2>/tcp/4001/ipfs/QmNPbmXBhnJGJEFyuCZAHRc2t6vtZaPBKZcmJbHaRt5uWz

machine2:

docker exec ipfs-node ipfs swarm peers
/ip4/<ip-1>/tcp/1024/ipfs/QmabDq9z9DYZQWpCdY7eNT1rxjjpJJQ77RwfLo6TcAggfD

They couldn’t run on the same IP because they are 2 completely different “physical” machines (on the same network).

You are getting a libp2p error that says there is no connectivity. Please check that machine1 can reach machine2 on the 9096 ports. I don’t know your network setup nor how you are running docker, but I can tell that either the ipfs you are using for peers add are incorrect or your ports are not accessible from one machine to the other.

Ok thanks I will check and let you know.

So I tried some commands but it doesn’t work.

Ports are opened:

tcp6       0      0 [::]:9094               [::]:*                  LISTEN
tcp6       0      0 [::]:9095               [::]:*                  LISTEN
tcp6       0      0 [::]:9096               [::]:*                  LISTEN

Then, I connect via telnet and it works:

telnet my-ip 9096

Then I tried:

docker exec ipfs-node ipfs-cluster-ctl --debug peers add /ip4/my-ip/tcp/9096/ipfs/QmUEoT5v5iHVLXyHsGD7BcCp1A8VWbUHHxDHcj2z2qR3WZ

Logs:

07:52:39.712 DEBUG cluster-ct: POST: http://127.0.0.1:9094/peers main.go:465
An error ocurred:
  Code: 500
  Message: dial attempt failed: context deadline exceeded
07:52:49.715 DEBUG cluster-ct: Response body: {"code":500,"message":"dial attempt failed: context deadline exceeded"}
 main.go:485

If I use the docker image ipfs-cluster, does the default port to connect peer is 9096 ?

An other thing, I don’t understand the purpose of this HTTP request:

07:52:39.712 DEBUG cluster-ct: POST: http://127.0.0.1:9094/peers main.go:465

Why do a POST request on localhost port 9094 ?

And it is this request that produces a HTTP Status 500.

I don’t know if that’s the problem, but your list of open ports is for ipv6, while you are using ipv4 in your multiaddresses…

  • Is that list from inside docker? You need to correctly expose those ports on both machines or run with --net=host
  • Are you sure you are using the right IPs? Are you sure that connectivity exists from inside one container to the other one? Your error message is consistent with a filtered port which drops packets.
  • As explained in the guide and the READMEs, ipfs-cluster-ctl talks to the REST API port (9094). This is different than the cluster port (9096) which is where the cluster libp2p host (and internal RPC) listens. Rest API gives you a 500 because your request failed.

@raucoule1u did you make sure the secret key is the same on both peers? Here is the guide for how to setup the configuration: https://github.com/ipfs/ipfs-cluster/blob/master/docs/ipfs-cluster-guide.md#the-configuration-file

The cluster section of the configuration stores a secret: a 32 byte (hex-encoded) key which must be shared by all cluster peers. Using an empty key has security implications (see the Security section). Using different keys will prevent different peers from talking to each other.

Praxis is to have the config being identical on all peers, except the id and private_key fields.

No, it’s not the secret at this point. There error message would be different then.