File systems chunk small files - big files

From @rddaz2013 on Tue Feb 07 2017 16:30:29 GMT+0000 (UTC)

I hope the description on the site https://medium.com/@ConsenSys/an-introduction-to-ipfs-9bba4860abd0#.zbe99k4rn is not out of date.

My question is…

A small file (< 256 kB) is represented by an IPFS object with data being the file contents (plus a small header and footer) and no links, i.e. the links array is empty. Note that the file name is not part of the IPFS object, so two files with different names and the same content will have the same IPFS object representation and hence the same hash.

That fine…but for large files…

A large file (> 256 kB) is represented by a list of links to file chunks that are < 256 kB, and only minimal Data specifying that this object represents a large file. The links to the file chunks have empty strings as names.

Why 256kB?
Is the Hash taken only from the data - not ‘Name’ and ‘Link’?



Copied from original issue: https://github.com/ipfs/faq/issues/223

From @whyrusleeping on Tue Feb 07 2017 19:23:11 GMT+0000 (UTC)

256k was chosen as a decent size block chunk that is evenly divisible by standard disk block sizes. Hashes are taken over the entire structure, as a merkletree

What happens when I try to put an object which has more than 256kB? Will it be split into several objects automatically or is this only a feature of the file layer?

Does the data part of an IPFS object need to be smaller than 256kB only or is the link array size included in this limitation?

putting objects directly (via ipfs dag put or ipfs object put) will not automatically chunk things. 256k is the default chunking size used in the unixfs importer, but the actual hard limit is 2MB (including framing, so you cant quite make and transfer a 2MB object).

We’re thinking about how to properly support large objects in the future, as some applications we want to support (such as git) have very large objects.

2 Likes

When big file (>256Kb) is chunked, all pieces of this file stored in one node or does they distributed among all(some) nodes in the network?

If so, what happens if one of the node stored its piece of the entire file is unavailable, does this mean that we will not be able to collect back the entire file?

When you “add” a file, none of its parts is send anywhere. You just advertise to other peers: “Hey, I got this file. If you want it, you can ask me and I’ll will send it to you”. If they do, they now have a copy, and other peers can ask to them too.
A file you add (chunked or not) always stays at least on your disk, unless you Garbage collect it.