Publishing and maintaining a large directory structure

I have a process that’s producing a new file every minute, and I’d like to make all of these files available over IPFS. In particular, I’d like the entire set of files to be easily accessible to anyone.

When these files are produced, I organize them into a tree structure of Year-Month-Day-Hour hierarchical directories with 60 minute files in the Hour directory. Over the course of a year, I’ll produce a few hundred thousand files that are organized in this way.

To make all of this available over IPFS, I assume that I can simply use IPNS on the top node (e.g. the Year directory). If I update this every time I add a new file (every minute), everything should be available and always up to date.

So my question is: Is this the right way to do this? It seems to me that to hash the topmost node, hashes for a few hundred thousand files will need to be calculated which doesn’t seem so efficient. Is there a better way to do this?

1 Like

That should be no problem whatsoever, as long as there are not too many children at one level, which won’t be the case with your chosen directory structure.

That’s not how it works. In a merkle dag, the hash of a branch is computed by hashing the hashes of the children. So even if you have a very large hierarchical structure, the effort for computing the new root hash is tolerable. It is usually dominated by the effort for hashing the new content, which has to be done anyway to add it to ipfs.

Hint for doing this: the ipfs object patch command is what you want. Basically, you start with an empty directory, and then use object patch add-link -p -- <current> <path> <value>, where current is the hash of the current directory, path is a / separated path, and value is the hash of the new entry (file or directory). In your case path would be e.g. ‘2018/06/01/19’.

The command produces a new directory and returns the hash of that. You need to persist it somewhere. The usual recommendation is to use IPNS, but an alternative is to just directly update a dns record. At actyx.io , we found using DNS (aws route53) more reliable than IPNS. YMMV.