IPFS Cluster Error: merkledag: not found

Hi there,

I managed to set up an IPFS Cluster with three peers which are seeing each other and able to replicate and pin files.
Yesterday an error occurred on the first node and I couldn’t find helpful information on the internet, so I’m posting it here:

I have three Amazon Linux 2 Nodes which are running ipfs version 0.4.19-dev and ipfs-cluster-ctl version 0.7.0. They form a cluster and are able to see each other via bootstrap.

    ipfs-cluster-ctl peers ls
QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC | ip-172-16-0-22.eu-central-1.compute.internal | Sees 2 other peers
  > Addresses:
    - /ip4/127.0.0.1/tcp/9096/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
    - /ip4/172.16.0.22/tcp/9096/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
    - /p2p-circuit/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
  > IPFS: QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
    - /ip4/127.0.0.1/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
    - /ip4/3.122.101.162/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
    - /ip6/::1/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db | ip-172-16-0-83.eu-central-1.compute.internal | Sees 2 other peers
  > Addresses:
    - /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
    - /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
    - /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
  > IPFS: QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
    - /ip4/127.0.0.1/tcp/4001/ipfs/QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
    - /ip6/::1/tcp/4001/ipfs/QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt | ip-172-16-0-17.eu-central-1.compute.internal | Sees 2 other peers
  > Addresses:
    - /ip4/127.0.0.1/tcp/9096/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
    - /ip4/172.16.0.17/tcp/9096/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
    - /p2p-circuit/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
  > IPFS: QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
    - /ip4/127.0.0.1/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
    - /ip4/3.122.104.101/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
    - /ip6/::1/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5

The files that I uploaded are replicated and pinned among all nodes.
At first everything worked fine as I expected: I was able to add files from all three nodes and cat them from the other two nodes. But then I got an error on the first node saying that the merkledag could not be found.

What I don’t understand is that if I run pin ls on the first node it seems to work just fine, since all the pins are listed correctly. But when I pick a multihash from the list of pinned files it throws the described error.

ipfs-cluster-ctl pin ls
QmfCrmuqpsmcuje7qZtuMctBASGt5iN8p5L5fahNAUJh7S |  | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t |  | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH |  | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym |  | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive

ipfs cat QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t
Error: merkledag: not found

But this problem only occurred on the first node. The other two nodes are still able to cat the pinned files.

The log of the first node might also be helpful. It seems that the ipfs-cluster daemon is shut down by systemd, but when I check it with systemctl status ipfs-cluster, the service is still active. Stopping and restarting the service didn’t change the error.

ipfs-cluster-service[4021]: 14:06:24.374  INFO    cluster: ** IPFS Cluster is READY ** cluster.go:420
systemd[1]: Stopping ipfs-cluster-service daemon...
INFO    cluster: shutting down Cluster cluster.go:439
INFO  consensus: stopping Consensus component consensus.go:176
ERROR       raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
ERROR       raft: Failed to take snapshot: nothing new to snapshot logging.go:105
INFO    monitor: stopping Monitor pubsubmon.go:155
INFO    restapi: stopping Cluster API restapi.go:449
INFO  ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
INFO   ipfshttp: stopping IPFS Connector ipfshttp.go:184
INFO pintracker: stopping MapPinTracker maptracker.go:119
systemd[1]: Stopped ipfs-cluster-service daemon.
        -- Reboot --

Can somebody please help me?

Error: merkledag: not found means your ipfs daemon is not running and the hash is not available locally (I think). ipfs-cluster-clt status should give information on whether things are actually pinned in their destinations or an error happened. Also, sometimes it is necessary to run ipfs commands with the same user that ipfs is running, so that it can see the ipfs repository and configuration correctly (sudo -u ipfs -i ipfs cat ... assuming ipfs is running with the ipfs user).

I’m not sure about the peer being shut down by systemd. If it’s running then it might just be an older log entry (-- Reboot -- means it was prior to a reboot?`).

Thanks for your fast answer!

I ran the status command and there was no error shown:

ipfs-cluster-ctl status
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym :
    > ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-18T12:15:08Z
    > ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
    > ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-18T12:15:08Z
QmfCrmuqpsmcuje7qZtuMctBASGt5iN8p5L5fahNAUJh7S :
    > ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T10:08:49Z
    > ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
    > ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T10:08:49Z
QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t :
    > ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T12:28:33Z
    > ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
    > ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T12:28:33Z
QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH :
    > ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T13:15:08Z
    > ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
    > ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T13:15:08Z

I also ran the cat command with the ec2-user on which my ipfs is running, but got the same error:

sudo -u ec2-user -i ipfs cat QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t
Error: merkledag: not found

Here the latest logs that I got with journalctl. If I understand it correctly, he reboots the ipfs-cluster daemon.

Dez 19 13:29:39 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 13:29:39.320  INFO    cluster: ** IPFS Cluster is READY ** cluster.go:420
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopping ipfs-cluster-service daemon...
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO    cluster: shutting down Cluster cluster.go:439
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO  consensus: stopping Consensus component consensus.go:176
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 ERROR       raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 ERROR       raft: Failed to take snapshot: nothing new to snapshot logging.go:105
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO    monitor: stopping Monitor pubsubmon.go:155
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO    restapi: stopping Cluster API restapi.go:449
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO  ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO   ipfshttp: stopping IPFS Connector ipfshttp.go:184
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047  INFO pintracker: stopping MapPinTracker maptracker.go:119
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.048 ERROR libp2p-raf: Failed to decode incoming command: stream reset transport.go:37
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopped ipfs-cluster-service daemon.
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Started ipfs-cluster-service daemon.
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Starting ipfs-cluster-service daemon...
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.865  INFO    service: Initializing. For verbose output run with "-l debug". Please wait... daemon.go:44
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.871  INFO    cluster: IPFS Cluster v0.7.0+gitfdfe8def9467893d451e1fcb8ea3fb980c8c1389 listening on:
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: cluster.go:107
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872  INFO    restapi: REST API (HTTP): /ip4/127.0.0.1/tcp/9094 restapi.go:414
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872  INFO    restapi: REST API (libp2p-http): ENABLED. Listening on:
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: restapi.go:431
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872  INFO  ipfsproxy: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfsproxy.go:205
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872  INFO  consensus: existing Raft state found! raft.InitPeerset will be ignored raft.go:203
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.372  INFO  consensus: Current Raft Leader: QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db raft.go:293
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374  INFO    cluster: Cluster Peers (without including ourselves): cluster.go:405
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374  INFO    cluster:     - QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC cluster.go:412
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374  INFO    cluster:     - Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt cluster.go:412
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374  INFO    cluster: ** IPFS Cluster is READY ** cluster.go:420
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopping ipfs-cluster-service daemon...
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO    cluster: shutting down Cluster cluster.go:439
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO  consensus: stopping Consensus component consensus.go:176
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 ERROR       raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 ERROR       raft: Failed to take snapshot: nothing new to snapshot logging.go:105
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO    monitor: stopping Monitor pubsubmon.go:155
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO    restapi: stopping Cluster API restapi.go:449
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO  ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO   ipfshttp: stopping IPFS Connector ipfshttp.go:184
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231  INFO pintracker: stopping MapPinTracker maptracker.go:119
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopped ipfs-cluster-service daemon.
-- Reboot --
Dez 19 14:08:47 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Started ipfs-cluster-service daemon.
Dez 19 14:08:47 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Starting ipfs-cluster-service daemon...
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: 14:08:48.306  INFO    service: Initializing. For verbose output run with "-l debug". Please wait... daemon.go:44
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: 14:08:48.331  INFO    cluster: IPFS Cluster v0.7.0+gitfdfe8def9467893d451e1fcb8ea3fb980c8c1389 listening on:
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db

May there be another cause for my problem?

Regarding the restart logs, does it do it all the time? (They are from yesterday, one seems a system reboot)

Regarding merkledag: not found…hmm I’m not sure. does ipfs pin ls --type=recursive show that hash?

No, the hash is not appearing in the list when I execute ipfs pin ls --type=recursive

ipfs pin ls --type=recursive 
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym recursive
QmauaRtCNg9kEAHCoXH1Bcd25BbqBMGY6FpruVhks5ycNX recursive
QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv recursive
QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn recursive

Those logs are the latest that I got with the jounalctl command. No, it doesn’t do it all the time.
You are right, thats my fault. Those logs are from yesterday and I produced them myself by trying to remove the error stopping and starting the ipfs and ipfs-cluster service. I thought they might help, but ended up confusing the situation - sorry!

It is contradictory that ipfs pin ls does not show the pin and ipfs-cluster-ctl status says it’s pinned.

merkledag not found is consistent with the pin not being available locally and the ipfs daemon being not running on that peer.

I suggest you double check your setup, make sure that the cluster peers are talking to the right ipfs daemons (hopefully running locally on the same box), and that the server from which you are running ipfs is running one of those ipfs daemons.