Recovering data from a failed node

Hey all,

I am trying to recover the data in an instance in which we don’t know what happened. The data is not super critical but we would like to be able to get it back. The last operation done was to increase the GC Watermark from 100GB to 150GB and when we restarted the server, it didn’t come online. After many trials, it always just hangs up, doesn’t boot properly, and doesn’t shut down with a single interrupt.

Currently when I boot up the IPFS daemon,

Initializing daemon...
go-ipfs version: 0.10.0
Repo version: 11
System version: amd64/linux
Golang version: go1.16.8

These are the only logs that are printed and I cannot connect to API or any other endpoint.

We had roughly ~100GB of data, I’ve created another instance, moved everything there, and started it to see if the issue is “host-related”. But the new server is stuck at this booting state for more than 3 days which I killed at this point.

If I try to run a simple command like ipfs add ipfs_test.txt command when the server is running(but it doesn’t boot up properly as I said), it shows me Error: lock /root/.ipfs/repo.lock: someone else has the lock, and I remember seeing “Merkle dag not found” error when the daemon is not running.

Trying to do a curl on 5001 results in “Connection refused”.

We have the data duplicated in a few locations right after the incident, but I don’t have any clue how to recover from such an error.

We also have a list of IPFS hashes that we used to pin in this instance if it can help with anything.

Thank you for your time!

What storage are you using? Flatfs or badger?

Any info starting with export IPFS_LOGGING=debug ?

1 Like

you can try to directly get the file through ipfs get or cat.
even though your node is ded, if you know the CID you may still get your files back, they are likely to have been cache by other nodes I would say, so you just spend a week or a month monitoring these ID and when some people turn on their computer they might have you data are pieces of it (like how pirating a tv show from 25 years ago works)
but you should take a snapshot of your server first, datas are still there even if your node is dead they are copied into a folder configured in ipfs

2 Likes

I’m using Flatfs, the related config is the following,

            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "type": "flatfs"
          "prefix": "flatfs.datastore",

I’ve routed the log to a text file, but I couldn’t find anything relevant, here it is in a gist:

Thanks a lot!

I think /repo/flatfs/shard/v1/next-to-last/2 has all your objects, just revive your ipfs and you can ipfs get them again

What do you mean by reviving? I’ve kept the server open for ~3 days after running ipfs daemon and the IPFS daemon didn’t boot up. Is there a different process for “reviving” other than running ipfs daemon?

Can you kill the process with kill -ABRT <pid> and post the resulting stacktrace? That should tell us what it is waiting on or doing.

Here are the logs as gist from kill -ABRT <pid>.

We think the MFS root is not available locally.

You would need to grab and compile this tool and run this while ipfs is NOT running:

mfs-replace-root QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn

which sets the mfs root to the empty folder. You can potentially set it to any other folder-CID that you know is in your datastore.

1 Like

Note that if you do this and you had some data that was not pinned (e.g. `ipfs pin ls --type=recursive), but was referenced under MFS that it will no longer be protected from garbage collection so you probably want to not run GC (either manually or with the --enable-gc flag) until you’ve protected your data either via pinning or MFS.

1 Like

Hey, really thank you for taking your time on this. I cannot run the tool with go under Ubuntu at this point, I’ve opened an issue under your repository.

Thank you, I’ll update the GC watermark to a higher level before running the server and then pin them all.

If you are using a server maybe take a snap shot to back up your data, one way or another. so in case things don’t work out for you, you can reinstall everything, and then copy your data over unless ipfs can be reinstalled without loosing the data.

1 Like