Why not just a DHT of who has which file?

wouldn’t it reduce the size of DHT if instead of chunks we maintained a hash of files and who has them , and then the node that needs that file could ask for a chunk of that file from those who have it .
this will avoid the problem of data duplication that IPFS currently has .
what is the point in giving each chunk of a file a separate identity ?

like right now , for each file IPFS demands an extra space bigger than the file itself to store the chunks of the file . wouldn’t it make more sense if IPFS just took the hash and path of that file and advertise it on the network instead of making a duplicate bigger than the initial data .

or even better , we could tell IPFS whatever file is put in this specific path , automatically hash it and tell the network that you have it . and provide nth chunk anytime another node asks for an nth chunk .
without dissecting the poor file and storing it in a dissected state .
that forces the OS to reconstruct the main file from the chunks every time we need that file and adds extra work .

Chunking allows you to potentially deduplicate portions of a file. It also allows you to retrieve different blocks from different nodes. There was a previous discussion of ideas for resolving files be the complete file hash rather than the merkel root.

If you just ask for the nth chunk you have no way of verifying that the entire is correct until you’ve collected all the chunks and no way of knowing which chunk is corrupt

1 Like

how often two files have identical blocks ?
the mathematical chance is one over 2 to the power of 256000 .

ahh you’re right , we need to know which chunk is corrupt . instead of flushing the entire 10GB file and redownload it again .

but is it necessary to store the data and the metadata combined ?
couldn’t we just store the hash of the nth chunk in the IPFS engine and the data itself in the file ?

I think the —raw-leaves option might be somewhat analogous to what you’re talking about or at least the filestore which requires the raw leaves option.

One nice thing about storing them combined is you don’t need a separate mechanism for sharing the metadata than you do for the data. Metadata is just data afterafterall.

1 Like

so IPFS tries to impose a new unified approach to data , instead of respecting the old order and trying to maintain a harmony with it .

Relax, this is IPFS not Anakin Skywalker.

come on , don’t try to be humble , this is IPFS .
IPFS is packed with ambition even in its name .