S3 Datastore too many requests that causes increasing AWS costs

Hello,

I currently use GitHub - ipfs/go-ds-s3: An s3 datastore implementation as the datastore for my IPFS. But there are too many requests (HeadObject requests) to the s3 bucket that results in significantly increasing in costs.

As I look at the code of go-ds-s3, it seems to come from the function GetSize (go-ds-s3/s3.go at master · ipfs/go-ds-s3 · GitHub).

Any advice on this? Can we configure the frequency of this routine?

And in the worst case, we may need to no longer use s3 as datastore. So i am looking for an tutorial on how to migrate the data from s3 to local machine.

Many thanks!

Take a look at Filebase - S3-Compatible, Edge-Caching and at a fraction of the cost of the other pinning services out there.

5GB always free - one month 5TB trial with code “IPFS”

1 Like

Sorry for mentioning you @hector, but do you have any idea on this? About how to decrease the number of requests to S3 or migrating data from s3 to local machines (flatfs datastore).

Here is my current datastore_spec

{"mounts":[{"bucket":"bucket-name","mountpoint":"/blocks","region":"us-east-1","rootDirectory":"bucketdirectory"},{"mountpoint":"/","path":"datastore","type":"levelds"}],"type":"mount"}

I want to change it into

{"mounts":[{"mountpoint":"/blocks","path":"blocks","shardFunc":"/repo/flatfs/shard/v1/next-to-last/2","type":"flatfs"},{"mountpoint":"/","path":"datastore","type":"levelds"}],"type":"mount"}

Notes: we also use ipfs cluster along with ipfs so please advice if anything else needs to be done

I think datastore.Has() is implemented via GetSize().

If I’m not mistaken however, usually the response to such requests should be cached. Increasing the sizes of the cache might be one way to reduce them:

(unfortunately only bloom filter size is configurable).

Otherwise, if the requests are very random for very random keys there is not much to do other than not using S3. If nodes are meant to provide content publicly, they need to check if they have it when requested.

1 Like

I’ve tested the ipfs and s3 as datastore loosely following A (loosely written) Guide to Hosting an IPFS Node on AWS - Developers - Fission Talk and I had the 2 GB of data added to the node which produced a lot of requests. The node had a single bootstrap node connected to the internet.

The point of the screenshot is not the price, it is about the number of requests that happened in just a few hours of usage. This is a solid base to calculate the potential cost of the real-world example. Maybe @filebase can send real-world screenshot of their usage.