IPFS cluster add file error

Hi guys,

I recently encounter this error after operating IPFS for a few weeks. It just occurred recently (we do not have any updates recently). Everything was working fine from the beginning.


Besides, there were error ERROR core/commands/cmdenv pin/pin.go:133 context canceled occurs in the IPFS node as well.

FYI, the screenshot logs is from an IPFS cluster instance. The IPFS cluster runs in the same container with the IPFS node.

Appreciate any advice.

This error means that block/put hangs and timeouts out on IPFS. Perhaps ipfs is in the middle of a long GC round, or it is so busy that it cannot write blocks to disk. An ipfs error in any case.

Cluster timeouts for ipfs operations are adjustable in the cluster config (ipfshttp section), but I think defaults are more than enough.

This is happening for days and for all of the incoming requests. Wondering what task is taking place that make IPFS so busy.

Also tried restarting the container as well but no luck.

Do you have any advice on debugging this?

Encounter new error today @hector. Really confused what is happing!
Please advice if you know the cause. Many thanks!

That error is probably unrelated. I’m thinking perhaps the request is being aborted and that causes the context.Cancelled. Do you have nginx in front of the cluster API or something like that?

We have an API gateway for our backend APIs. The backend internally makes requests to the cluster API. The cluster API here is just exposed internally (only the backend can make requests to it).

Thanks for the hint. Let me test it further. But personally I do not think this is the reason.

If you come up with any idea, please let me know. Many thanks Hector.

Your log excerpts are very short.

Do all requests fail? Just some?

Do they fail immediately? Do they hang and fail later? Do they fail when block/putting the first block, a random block, the last block? Is it always when “finalizing”?

I have trouble that your log says “error when finalizing”, but that is just doing the pin and I believe the context is already cancelled, and that can only happen if the request died.

You can run ipfs-cluster-service --log-level ipfshttp:debug,adder:debug daemon and get more info in the logs.

Also, make sure your application calling the /add endpoint is prepare to both write the request and read responses at the same time. Nginx breaks the moment the server emits a response, aborting requests that have not fully sent the multiparts.

1 Like

Many thanks Hector. I will check it further.

In the mean time, here are some info for you

Do all requests fail? Just some?

Yes, all the requests fail.

Do they fail immediately? Do they hang and fail later? Do they fail when block/putting the first block, a random block, the last block? Is it always when “finalizing”?

It fails immediately (right after the /add request), and the error messages are always the ones I screenshot. The errors are the same for all request failures.

======

Some new info: We got some unpinned CIDs and the cluster keeps retrying pinning them without any success (this also happened recently, at the same time with all the errors in this posts. Everything was fine at the beginning). Only have logs ERROR core/commands/cmdenv pin/pin.go:133 context canceled on the IPFS side. No error logs on the cluster side

Hi @hector,

The error seems to come from the DNS setup. We are checking it further. Really thanks for the hint.

====

I have a new question. Currently we receive different API response for the same API call (/add request). Wonder what config/setup that results in the following differences.

Response 1:

Response 2:

The above difference seems coming from the recent update of ipfs-cluster from v0.14 → v1.0.0.
Is it correct @hector?

Yes: ipfs-cluster/CHANGELOG.md at master · ipfs/ipfs-cluster · GitHub

Can you ellaborate?

1 Like

We are checking it further. Something related to our load balancer, I think. Really thanks for your help Hector!

Thanks for the update. We will upgrade the cluster to latest version soon. Great project btw.