Distributing a file catalog

I’ve noticed that while adding and serving files is a solved problem in IPFS, file discovery is not.

What are some idiomatic methods to advertise a list of files?
Of course, the most obvious solution is to share a directory CID from the Files API using IPNS or DNSLink.
However, this naive solution doesn’t necessarily scale because it simply downloads the whole list. Imagine downloading the whole filesystem tree into memory every time you open the file explorer.

Sane APIs would have you limit your queries GET /files&start=0&end=100 so that both ends don’t experience undue load, or offer changes since a timestamp GET /files&since=1234567890, but this isn’t possible in IPFS because it only serves static files.

It should be possible to partially alleviate the issue by structuring the CID in an intelligent way, e.g. by serving “pages”–lists of lists–which the end-user can dereference at will, however this technique is not friendly to the network if the files are constantly mutating. The catalog will lean towards centralization and be poorly distributed.

An alternative could be through implementing the file catalog similarly to an append-only database that experiences occasional “garbage collection”. This reduces the amount of churn on the network and should make it possible for the user to query using a timestamp. I think it sounds similar to OrbitDB’s feedstore.

3 Likes

Struggling with this issue too. I’d really like to hear some solutions. As far as I know, DNSLink and IPNS are super slow. I think some people publish hashes/CIDs to Ethereum, but that also isn’t super fast. The compromise I’m considering now is to for a desktop app to store a list of CIDs locally in a JSON as they are updated, but for a webapp, I can only use a centralized database to store CID locations. Another workaround I was thinking was publishing JSONs to a github repository every time CIDs are updated, but its also not lightning fast and scalable as a private server.