Best way to pin multiple hashes into a single hash

I have multiple hashes representing important content, and I would like to distill all of this information to a single hash, which I can then easily ask my friends to pin for redundancy. It is possible to pin multiple hashes using ipfs pin add hash1 hash2 ..., but this does not result in a single hash for all content. Right now, I can achieve what I want by the following:

$ mkdir tmp
$ cd tmp
$ ipfs get hash1
$ ipfs get hash2
[...]
$ cd ..
$ ipfs add -r tmp
$ rm -rf tmp

This results in a hash to a directory that contains all hashes of interest as subdirectories (the subdirectory names are exactly the hashes of interest).

However, this process seems wasteful and roundabout. Imagine that I have thousands of hashes representing gigabytes of content. Each time one of the hashes changes, there is a lot of data that needs to move around following the above procedure. Is there a more direct way to start with a list of hashes and obtain the hash for a directory which includes those hashes as its entries, without any data even needing to leave the datastore?

You could try ipns …

  1. Generate a key:
    ipfs key gen --type ed25519 mykey

  2. Publish the folder to the key:
    ipfs name publish --key mykey <top level folder hash>

  3. The folder with all the content will be available at the key hash:
    ipfs://ipns/12D3K...

Thank you for the reply. While this is new information to me, it does not solve my problem, as it assumes I already have a top level folder hash. What I have asked for is a method of efficiently generating this hash, without needing to check all relevant content out of the datastore.

In addition to the assumptions I mentioned, let’s assume that the multiple hashes I am pinning have a high degree of overlapping content among them (i.e., descendants with identical hashes), making the procedure I suggested above especially wasteful in terms of disk space and resources.

This should give you the folder hash…

Here’s a one line bash loop for demonstration purposes:

$ mkdir test;for f in {1..10};do touch "./test/$f"; done;ipfs add -r test
added QmbFMke1...u6x1AwQH test/1
added QmbFMke1...u6x1AwQH test/10
added QmbFMke1...u6x1AwQH test/2
added QmbFMke1...u6x1AwQH test/3
added QmbFMke1...u6x1AwQH test/4
added QmbFMke1...u6x1AwQH test/5
added QmbFMke1...u6x1AwQH test/6
added QmbFMke1...u6x1AwQH test/7
added QmbFMke1...u6x1AwQH test/8
added QmbFMke1...u6x1AwQH test/9
added QmfHbthu...MTb9XrXSbW6f test

The last line contains the top level folder hash. Each of the files in the folder have the same hash… because they are all zero length empty files.

So, to put the folder under a single easy location, you can use ipns. Publish the top level folder hash to its own key… and if later new files are added… simply re-publish the folder’s new hash.

Or… to make it even easier, you can use the “files” command and then simply put all new files into the “files” folder…

Try:

ipfs files --help

for more detailed information on how to do that.

This is precisely the “solution” I already mentioned in the first post, but it does not address the shortcomings I outlined in that post and in posts #3 and #4 in this thread. In principle, the process of generating this hash should only involve the computational cost of calculating the hash of the new directory listing. However, the method given above requires disk space and computational time proportional to the entire de-duplicated content stored recursively, which is intractable for my problem.

I’m not sure what your question is here…

Are you asking for a method to generate a CID without adding any data to the IPFS datastore?

Does your use of the word “datastore” refer to the files in your filesystem rather than the IPFS datastore?

It seems logical that the computational work to generate CIDs external to IPFS would be the same as simply adding the data to the IPFS datastore. So, is your problem that you don’t have the drive space for external copies of the added data? Or do you want to pre-generate a list of CIDs before adding the files?

It’s unclear exactly what you are attempting and why?

If you are asking about “How to create a CID”

Here’s a decent in depth view:

I am referring to the ipfs datastore.

I have a collection of hashes of content in the ipfs datastore, representing gigabytes of content, rivaling the disk space I have available to me. I would like to create a single ipfs directory containing all of this content as subdirectories. If I understand the design of ipfs (and if it is anything like that way git stores directories), generating this single hash and pinning it should not require making an additional copy of the data outside the ipfs datastore and then hashing it again in its entirety. It should be a very cheap operation, computationally.

ipfs files --help


USAGE
ipfs files - Interact with unixfs files.

SYNOPSIS
ipfs files [–flush=false]

OPTIONS

-f, --flush bool - Flush target and ancestors after write. Default: true.

DESCRIPTION

Files is an API for manipulating IPFS objects as if they were a Unix
filesystem.

The files facility interacts with MFS (Mutable File System). MFS acts as a
single, dynamic filesystem mount. MFS has a root CID that is transparently
updated when a change happens (and can be checked with “ipfs files stat /”).

All files and folders within MFS are respected and will not be cleaned up
during garbage collections. MFS is independent from the list of pinned items
(“ipfs pin ls”). Calls to “ipfs pin add” and “ipfs pin rm” will add and remove
pins independently of MFS. If MFS content that was
additionally pinned is removed by calling “ipfs files rm”, it will still
remain pinned.

Content added with “ipfs add” (which by default also becomes pinned), is not
added to MFS. Any content can be put into MFS with the command “ipfs files cp
/ipfs/ /some/path/”.

NOTE:
Most of the subcommands of ‘ipfs files’ accept the ‘–flush’ flag. It defaults
to true. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees. If the daemon is unexpectedly killed before running
‘ipfs files flush’ on the files in question, then data may be lost. This also
applies to run ‘ipfs repo gc’ concurrently with ‘–flush=false’
operations.

SUBCOMMANDS
ipfs files chcid [] - Change the cid version or hash function of the root node of a given path.
ipfs files cp - Copy any IPFS files and directories into MFS (or copy within MFS).
ipfs files flush [] - Flush a given path’s data to disk.
ipfs files ls [] - List directories in the local mutable namespace.
ipfs files mkdir - Make directories.
ipfs files mv - Move files.
ipfs files read - Read a file in a given MFS.
ipfs files rm … - Remove a file.
ipfs files stat - Display file status.
ipfs files write - Write to a mutable file in a given filesystem.

For more information about each command, use:
‘ipfs files --help’


I would use IPNS for this … just like my first post.

Once you generate your key and publish the top level directory CID to it, then all you need to do is distribute the IPNS key hash. If you add more files to the directory, simply publish new CID to the same key. In this way, the newest version of your file set will always be available at that same IPNS location.

1 Like

Thank you. From what I can tell so far, ipfs files is exactly what I am looking for.