Latency of object-put seems to degrade over time under highly concurrent loads

I am generating a highly concurrent load (~100 agents doing object-put, stat, pubsub, object-fetch etc) against one (or a small number of) ipfs node(s). Objects generated by the test are small (<100bytes). I use latencies of object-put (HTTP round-trip time) as a measure of system performance. I find that performance degrades after a few hundred thousand object creates.

The latency for object-put starts out at around 200ms. After a few hundred thousand object-puts during which around a few Gigs of data has been put into the datastore, round trip times degrade to around 2seconds. This degradation continues over time, and I’ve seen it grow up to ~15seconds, at which point I’ve felt the need to reset my benchmark.

If a ‘degraded’ system is made to serve light loads, i.e. very little concurrency, then object-put latencies aren’t too bad (around 400ms).

Deleting *.ldb files under /datastore causes performance to be restored. The .ldb extension suggests that these are leveldb files.
Can someone help explain this behaviour? What is contained in the .ldb files, and is it safe to reset them from time to time?

thanks,

It still has some bugs (e.g., GC doesn’t seem to work), but I’d be curious if badger (ipfs init --profile=badgerds) would perform any better for your use case.

This documentation seems to have a decent explanation. I don’t understand it well enough and haven’t looked into it enough to answer the second part of the question.

What happens if you run ipfs repo verify after deleting the *.ldb files? If no errors are returned, maybe it’s fine (?).

`[quote=“rgrover, post:1, topic:5089, full:true”]
Deleting *.ldb files under /datastore causes performance to be restored. The .ldb extension suggests that these are leveldb files.
[/quote]

`[quote=“leerspace, post:1, topic:5089, full:true”]
What happens if you run ipfs repo verify after deleting the *.ldb files? If no errors are returned, maybe it’s fine (?).
[/quote]

ipfs repo verify runs fine even after deleting the .ldb files. I also observed that repo verify runs faster after the delete; which indicates to me that repo verification would have attempted to refer to the .ldb files had they been present. I’m not sure if deleting .ldb files is completely safe.

I also discovered that not using pin=true in object/put avoids the performance degradation. I’ve let my stress benchmark run for a very long time with pin=false during object/put operations and there has been no discernible slowdown.

Is there a reliable way to pin files without paying the penalty of performance degradation?

Probably not until this is fixed: https://github.com/ipfs/go-ipfs/issues/5221