What is better? Large containers or large sets of files?

probonopd · December 4, 2017, 12:56am

I think this was never fully answered.

Let’s say we have Ubuntu, Xubuntu, Edubunu, Kubuntu,… Live ISOs. Files are stored inside the ISO in a compressed squashfs filesystem. It is safe to assume that a large portion, if not the majority of the infividual files inside the squashfs filesystem is identical, but since the files may be differently arranged in the squashfs filesystems, the fixed block size chunks may all be different between the different Live ISOs.

Wouldn’t it be most efficient for deduplication and re-use of already downloaded parts if chunks were not made by using predefined block sizes, but with some knowledge of the squashfs filesystem?

Or, could we have IPFS work on chunk the squashfs based on the individual files that make up a Linux Live ISO? In this case, the file libc.so.6 from Ubuntu, Xubuntu, Edubunu, Kubuntu,… would get the same hash (because it is always the same file) and could be shared across all of the mentioned Live ISOs.

In other words, wouldn’t we need content-aware chunking mechanisms rather than fixed block sizes?

The answer to this is also relevant to IPFS for AppImage: Distribution of Linux applications

Topic		Replies	Views
Why not just a DHT of who has which file? Help	10	343	April 15, 2021
File systems chunk small files - big files Old FAQ	5	2635	April 17, 2019
IPFS and deduplication Ecosystem use-cases-and-apps	7	700	June 4, 2022
Deduplication Ratio Help js-ipfs , go-ipfs , files	10	595	August 27, 2021
IPFS and file deduplication	3	2287	September 27, 2019

What is better? Large containers or large sets of files?

Related Topics