IPFS-Cluster loglevels settings being ignored

Hi Guys, great work - thank you!

Im using IPFS-Cluster v0.14.1 and IPFS v0.10.0
I have recently been load testing a 3 node private IPFS cluster setup running on K8s.
I add random files generated to my cluster using a load testing tool.

I was comparing raft vs crdt consensus performance, so having used raft previously i wanted to test crdt would be worth switching too.

When switching to crdt - i was suprised that after load testing, there are literally endless messages such as:

2021-12-08T23:39:03.235Z	INFO	crdt	go-ds-crdt@v0.1.22/heads.go:134	adding new DAG head: bafybeia2jc6evdkfvbo6cceq5m6yylhd6rqcltex55hwszda6n65wpoy4i (height: 2080)
2021-12-08T23:39:03.235Z	INFO	crdt	go-ds-crdt@v0.1.22/heads.go:90	replacing DAG head: bafybeiasoxe77nazrnjnbigs7vq2pgpxpyf7cu5znpyy5bktyqztjtzoui -> bafybeif6lolym7efvn5e76bq657jkuvzzheqcbm6ux2pkzyjll6xigz3oq (new height: 2097)
2021-12-08T23:39:03.235Z	INFO	crdt	go-ds-crdt@v0.1.22/heads.go:134	adding new DAG head: bafybeif3difxreor6ssix5rvq4vgyhl2xcc7gznkgp5j2fpia37ht7uwtm (height: 2082)
2021-12-08T23:39:03.296Z	INFO	crdt	go-ds-crdt@v0.1.22/heads.go:134	adding new DAG head: bafybeiawjxmfpey6h3u7hqiflz6nx7xxjijkx6mtdziriegcjklcsmdfhe (height: 2083)
2021-12-08T23:39:03.326Z	INFO	crdt	go-ds-crdt@v0.1.22/heads.go:134	adding new DAG head: bafybeih7hpkynhdibtlscnbjz2bax3uowcvlf4vy5db2g7v4frhac7rfmy (height: 2079)

These are extremely noisy logs for INFO level (several per millisecond) and completely flood my logging services. Also, due to their volume, they take up unwanted space in the logs on disk. I have no idea why they repeat endlessly after i run a load test which inserts ~25k files via ipfs-cluster over a 5 minute period.

I have tried firstly to specifically target the component and set it to log only ERRORs at the crdt component.
example:

exec ipfs-cluster-service daemon --stats --upgrade --loglevel info,cluster:info,crdt:error,pintracker:error 

This doesn’t work - i still would get the INFO level logging as shown above.

I then tried to default all logs to sending just ERROR level logs messages - but this again appears to be ignored.

exec ipfs-cluster-service daemon --upgrade --loglevel error --stats --bootstrap $BOOTSTRAP_ADDR --leave

I have noticed that the order of these command options seem to also have a consequence, in that options set after the --loglevel option are ignored depending on its placement amongst the options - or its is due to the string with colons - i.e, info,cluster:info,crdt:error,pintracker:error maybe causing the rest of the command to be omitted.

Appreciate the help on this, from load testing, i would like to use crdt - not because it has been proven to be significantly faster(its eventual consistency actually may be undesired for my app) however, while running in k8s it is preferential to the raft consensus as pods come and go for various reasons(deployments, updates etc). With Raft, this caused numerous startup issues (panic errors) when the leader node was not present.

I however wont be able to use crdt mode if the verbosity of these logs isn’t resolved, by either some error in my setup of the commands, or in combination with the verbosity level for this logging being moved to a DEBUG setting as it shouldn’t be INFO in my opinion.

Other issues from performance testing have highlighted crdt mode with badger datastore is not viable - i’ve seen the issue with badger taking up too much space(very large database · Issue #1320 · ipfs/ipfs-cluster · GitHub) - so have switched to leveldb - which though significantly better - still grows far too large when compared to the state files when using raft.

Hi,

the --loglevel flag is a top-level flag (global options), so it needs to go before daemon. (The --upgrade flag is a noop).

If you are load-testing crdt by hammering it, you should enable crdt batching, otherwise you are going to be issuing a lot of CRDT objects, making your chain grow longer and having to store more things on badger. For example, batching 500 objects in a single update will create 500 less heads and 500 less entries in badger.

Raft works very well for small stable setups, however it breaks down quickly with many nodes in the cluster.

In terms of database size, note that badger GC cycles are run every 15 minutes by default ( I don’t know if you account/wait for this in your tests), and GC makes a huge difference (a non-GCed badger store can grow to 1TB when it only needs 10GB).

I have a cluster in which badger uses 15GB. It has 6.6M pins. That gives 2.37KB per pin, which is not small but would be a very small overhead for most cases, as the average pin size would be 10 to 1000 more times than that (I’d expect). In our case, the average size would be above 8MB, so the cluster database overhead is negligible (15GB out of 55TB).

1 Like

Thanks Hector for the advice, been of great help.

I have switched the order of flags around so the –loglevel flag is top-level and confirm that it works now. Thank you for explaining that.

Just for context - regarding load testing, I run a load using Artillery www.artillery.io - generating before each load test a new set of random 1kb files from which a random file is chosen per request to be added via ipfs-cluster http api. (my app will have varying size files being added averaging, 1kb but can be as large as 3mb)

The load test when run against ipfs directly (over port 5001) show it can handle around 300 request per second(RPS) for the duration of the test (5mins) with no errors, and responses returned in a timely manner (<2s).

With Ipfs-cluster testing(over port 9095), i see it handles (excluding your batching suggestion) around 70-75RPS, pushing it up to 100RPS causes errors such as

ERROR	adder	adder/util.go:58	BlockPut on 12D3KooWB85W3pBvwAZak3o8R7GcfvsrxrNVC5Z5iKitJbpbcucv: Post "http://127.0.0.1:5001/api/v0/block/put?format=v0&mhlen=32&mhtype=sha2-256": dial tcp 127.0.0.1:5001: connect: cannot assign requested address

Response times degrade the more pins are added, and as said, with badger filling up the disk with its logs, i’d often end up with no further space available to write to disk.

But with your suggestion of using batching i increased the batching up to 500 and can see i get no errors during load tests like what i mentioned above, plus space doesn’t run out and I can now achieve a RPS of 100 on ipfs-cluster. (i understand ipfs-cluster will naturally be slower than hitting ipfs directly).

The main issues are that it seems that it takes a while after the load test is finished for pins to actually finish being added(~10mins after the tests and still some pins are just being added), and it seems there is quite the backlog and so i wonder about the eventual consistency and how long it takes for objects to be available on more than the local node that the pin was added on.

I don’t know if using large batches in a production setting will be safe for my application - if the data is held in memory, should a pod crash during, id lose my pins.

My application depends on pinning, the pinned data is of vital importance for regulatory purposes amongst other things (which is why in itself i am looking to use ipfs-cluster for resiliency and redundancy should a single pod fail and it seems crdt for scale is the best consensus).

You mentioned GC’s cycles of 15mins for badger which is the default and that does happen, what were you suggesting i needed to take the GC cycle into account exactly?
Are you suggesting if a GC on badger runs at the same time, i will lose pins?
I have looked into ipfs GC’s and testing when is the best time for them to occur during the apps running, but i haven’t yet factored in any issue with badgers GC cycle - should I be?

Forget everything about IPFS-GC for a moment.

IPFS-Cluster has it’s own badger datastore to store the pinset and a bunch of other CRDT information when using CRDT. You expect that putting 100KB on that badger will mean ~100KB on disk. But that is not the case, it may be much more, because there was a write to a write-ahead log, then to the permanent table where thing is stored. Also, if something is deleted (i.e. whenever the CRDT receives an update - previous heads are removed). The process to reclaim all that extra space is a badger-GC operation. It does not delete any of your data, and it does not have to do with pins, it is just internal maintenance of badger so that it uses less disk-space. The interval at which this is triggered is configurable in the cluster-peer config.

I did not know you were using /add endpoint and I’m surprised it got to 100 rps. How cluster does adding is by directly block-putting on all destinations. This is very inefficient the more destinations you have and also the crdt batching does not improve it, as that only affects the pin request at the end of the process. For files larger than 1 block (256KB usually), we directly recommend adding ?local=true to the request. This will only write the file to the local peer. The copy to the other peers will be done via IPFS-bitswap after cluster triggers the pin request on them.

The error you get means you are running out of sockets I think, perhaps because the cluster peer is connecting too many times to the ipfs peer, which may at the same time be getting very slow when it comes to write all the blocks.

Using local=true will speed up things a lot at the cost of not having instantly replicated to all the places. But overall I think it will reduce the time for things to get fully replicated.

What you say regarding losing a batch if the application crashes is true. Normally you either keep track on the side of the things you are supposed to be pinning, so that you can check if something is missing. Alternatively, you can just wait until a pin is successfully replicated and retry the request if not. I.e.:

1 - Trigger pin request
2 - Batch_max_age is set to 5s so wait 10s.
3 - Request pin with pin ls <cid> -

  • if unpinned, consider a failure and retrigger pin
  • Otherwise pin arrived to the state and is tracked.
    4 - Potentially request pin status with status <cid> until item appears as pinned in X cluster peers.

In general, I would recommend using the cluster REST API for production, as it offers more flexibility than the ipfsproxy one and is better tested.

The slow performance when adding to multiple peers is something that is on the top of my list of things to improve, so you can expect improvements soon enough (the main thing is to stream all blocks in a single request, rather than do 1 block per request).

1 Like