IPFS for audiovisual archives and preservation?

Just installed IPFS and am tinkering (stubmbling around) with it. In the meantime would like to know how appropriate IPFS might be for video preservation, which by nature implies files in the hundreds of GBs each and for which the generation of a hash takes considerable time & processor resources. Is this overcome by the distributed nature of the system or is it simply increased load? On the other hand, I can imagine Blockchain or some other *chain application as particularly appropriate for preservation metadata both desciptive and technical. Secondly, if these technologies are used by libraries and archvies, what are their implications for preservation file migration (considering the hash)? Finally, am on the org committee for an Audio Engineering Society conference to be held in June 2018 and wonder if anyone here would like to propose a presentation that might get the audiovisual community thinking about these technologies as they might apply to audiovisual preservation and archiving. Apologies for including the link to that conference here:
http://www.aes.org/conferences/2018/archiving/

Just checking. Were these questions off the mark or somehow inappropriate, or just uninteresting? Thanks!

Just checking. Were these questions off the mark or somehow inappropriate, or just uninteresting? Thanks!

No, we’re just really busy.

In the meantime would like to know how appropriate IPFS might be for video preservation.

Yes.

the generation of a hash takes considerable time & processor resources

In the scheme of thing, this isn’t that bad. I wouldn’t use IPFS as a scratch space for video editing software but hashing a video once isn’t likely to be much of an issue (theoretically, IPFS currently has a lot of other (unecessary) overhead but the hashing shouldn’t be that bad). We’ve also looked into switching to a faster default cryptographic hash function (sha256 is a bit slow) but haven’t had time (hashing hasn’t generally been the bottleneck).

Is this overcome by the distributed nature of the system or is it simply increased load?

No, it’s a consequence of the decentralized nature. If IPFS were centralized, we could get away with a much faster non-cryptographic “checksum” function. However, you can parallelize hashing/importing data into IPFS across multiple computers. I’d take a look at ipfs-cluster.

On the other hand, I can imagine Blockchain or some other *chain application as particularly appropriate for preservation metadata both desciptive and technical.

You can build blockchains on top of IPFS but IPFS itself isn’t necessarily related to blockchians (other than the fact that it uses merkle-links). IPFS is closer to git. However, you certainly could use a blockchain to track and pay for storage of files in IPFS (we’re actually working a project called Filecoin that will tackle exactly this).

Secondly, if these technologies are used by libraries and archvies, what are their implications for preservation file migration (considering the hash)?

I’m not sure what you’re asking here. Hashing ensures that we can easily validate files stored in IPFS. We generally migrate data over the network using a protocol we call bitswap. At the moment, this isn’t the fastest protocol but we’re working on that (again, dev time is always the limiting factor).

Finally, am on the org committee for an Audio Engineering Society conference to be held in June 2018 and wonder if anyone here would like to propose a presentation that might get the audiovisual community thinking about these technologies as they might apply to audiovisual preservation and archiving.

cc @flyingzumwalt

Thank-you for the response.

I asked about video because those are in general the largest files (apart from huge datasets) that archives typically deposit for preservation. The obvious attractions are the robustness of the distributed FS, the immutability of objects stored, and the potential cost savings for smaller, typically underfunded archives to afford preservation storage for uncompressed or mathematically lossless compressed files rather than being tempted to rely upon squeezing large video files into smaller, lossy, compressed formats for the sake of cost savings.

Regarding preservation file migration and the hash, my concern was whether or not replacing obsolete preservation files (when that becomes necessary) would be more complicated using IPFS than a centralized file system. Probably the best thing to do is to model such a system and see where the bottlenecks are (if any). Since the migration process would be automated and therefor not overly time sensitive, maybe it is not an issue.

Filecoin! Yes, thanks. I look forward to following the development.