I have a two node ipfs cluster. The cluster operates normally for about a week and then becomes unresponsive. Once in this state calls to its API time out. Restarting the ipfs-cluster-service (but not the ipfs daemon) fixes the problem until it happens again.
There are a variety of error messages in the log which are included below:
adder adder/util.go:58 BlockPut .. dial tcp 127.0.0.1:5001: socket: too many open files
adder adder/util.go:58 BlockPut .. dial tcp 127.0.0.1:5001: connect: connection refused
adder adder/util.go:58 BlockPut .. read tcp 127.0.0.1:41616->127.0.0.1:5001: read: connection reset by peer
adder adder/util.go:58 BlockPut .. EOF
ipfshttp ipfshttp/ipfshttp.go:749 IPFS request unsuccessful (repo/stat?size-only=true). Code: 500. Message getting disk usage at /: lstat /home/admin/.ipfs/datastore/391162.ldb: no such file or directory
Can anyone help me to diagnose/fix this problem?