In many countries with internet censorship, sneakernet is a very common form of data transfer. Even in countries without drastic internet censorship, a multi-terabyte-sized external hard drive is a pretty efficient way to transfer large amounts of data.
Currently, as far as I can see there is no support for mirroring the contents of the block store to an external hard drive. Or am I missing something?
Would it be a good idea to add this? It would seem to be quite simple conceptually.
In the most primitive form, you would have a command to export data from the block store to an external location, e.g.
ipfs export location (hashes)
and a command to import from a location, such as
ipfs import location
Data in the exported location would still be in native format of a block store, with hashes to ensure content integrity and allow for quick import and export.
A more sophisticated version would permanently monitor a number of file system locations and sync to/from them.
I know itās not intuitive, but you donāt need special commands to do this because IPFS is content-addressed. If you get or cat the files out of IPFS and then add them to another IPFS node elsewhere you will end up with the same hashes because you have the same content.
In other words, your suggested ipfs export and ipfs import commands are just a different name for ipfs get and ipfs add.
Instead of needing the proposed commands ipfs export location (hashes) and ipfs import location, you can just run these existing commands:
Now, it will be especially interesting when itās easier to run multiple IPFS nodes on the same machine, because then you can run a node whose data store is on the external drive and another node whose data store is in the default location and then use strategies like ipfs-cluster to sync content across the nodes. That will be really useful, but itās not necessary to satisfy the basic sneakernet usecase youāre pointing to.
I know itās not intuitive, but you donāt need special commands to do this because IPFS is content-addressed. If you get or cat the files out of IPFS and then add them to another IPFS node elsewhere you will end up with the same hashes because you have the same content.
That is perfectly clear. I was thinking that not converting from/to a file system hierarchy would be more efficient and more reliable, since the hashes would prevent unnoticed data corruption.
But just running a second ipfs node that is using the external drive as its data store might be a clean solution. This would benefit from an efficient transport for two ipfs nodes on the same machine (unix sockets or shared memory?), but that is just an optimisation detail. I guess you could write a polished gui for sneakernet based on this approach.
One thing: how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes? I know the ipfs philosophy is to share data only on explicit demand, but in this case the option to share everything might be useful.
how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes?
I canāt remember off the top of my head but I think there is a command to list all the hashes in your ipfs repo. There is definitely a command to list all of your pinned hashes. You can pipe that into whatever sync routine you run. In the long run, you will need to make explicit pin sets for the things you want to sync, and accumulate metadata about those pin sets, so you know what data youāre copying to which machines. That metadata layer is not explicitly accommodated by the ipfs protocol ā you need to build it yourself.
Thatās not entirely true. We may end up importing the files with a different chunking algorithm.
This would benefit from an efficient transport for two ipfs nodes on the same machine (unix sockets or shared memory?).
Weād like this, not only for communicating between nodes on a single machine but for communicating between a running daemon and the CLI tool, but we donāt have it yet.
One thing: how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes? I know the ipfs philosophy is to share data only on explicit demand, but in this case the option to share everything might be useful.
Not that I know of and, due to GC, itās probably best not to rely on this. Personally, I recommend either pinning or adding files you care about to your local mfs (ipfs files ...).
Note: an alternative to all of this is to shutdown your local daemon and copy the repo. However, that format changes over time so itās less likely to be stable.
Seekable/Traversable: It should be possible traverse a DAG through a CAR in one pass (without necessarily reading the entire CAR).
Simple/Stable: Importing/Exporting should be easy. That repo I linked to mentions things like signatures, metadata, etc. However, thatās really a separate concern.
Compact.
Iām currently writing a proposal but a CAR will likely be a concatenation of a topological sort of the IPLD DAG. Specifically:
I know Iām picking up an old conversation here, but I felt it is relatively close to what Iām asking about. First, Iām not a developer, but have a question.
Does anyone know of a way, or if it could be possible, and forgive me if this sounds like a dumb question, but to be able to format a hard drive in a kind of ālocalā IPFS? ie instead of having a drive formatted in mac journaled, APFS, or exFAT etc, it would be a kind of ānativeā IPFS.
The use case Iām wondering about has to do with the inherent hashing functionality of files, the versioning ability, and the ability to effectively make your own āprivate networkā of drives to control access to with something like a swarm key. So the distributed public network aspects, at least for this part, are not of use