What is better? Large containers or large sets of files?

I think this was never fully answered.

Let’s say we have Ubuntu, Xubuntu, Edubunu, Kubuntu,… Live ISOs. Files are stored inside the ISO in a compressed squashfs filesystem. It is safe to assume that a large portion, if not the majority of the infividual files inside the squashfs filesystem is identical, but since the files may be differently arranged in the squashfs filesystems, the fixed block size chunks may all be different between the different Live ISOs.

Wouldn’t it be most efficient for deduplication and re-use of already downloaded parts if chunks were not made by using predefined block sizes, but with some knowledge of the squashfs filesystem?

Or, could we have IPFS work on chunk the squashfs based on the individual files that make up a Linux Live ISO? In this case, the file libc.so.6 from Ubuntu, Xubuntu, Edubunu, Kubuntu,… would get the same hash (because it is always the same file) and could be shared across all of the mentioned Live ISOs.

In other words, wouldn’t we need content-aware chunking mechanisms rather than fixed block sizes?

The answer to this is also relevant to IPFS for AppImage: Distribution of Linux applications