Monolithic File Hash , a proposal

currently we can have several different CIDs that are actually the same file .
wouldn’t it be good if a file CID (CID that corresponds to a certain file) contains not only the hash of chunks but a hash of the entire file , so that files are identified by the monolithic hash .
in case a CID is not available the system knows the alternative CID for that file . maybe they’re available .

An extensive discussion was had on this forum about that about 3 weeks ago. I think identifying all files by their SHA256 was the suggestion, and I’m pretty sure it was deemed impractical for technical reasons. Sorry I don’t have the link, but i bet you can find it.

1 Like

As far as I understood, one of the points of the CID is that one day we may have a standard different from sha256. And what then?

I think you mean the “multihash” part of the CID ?
I think then we can migrate to the new hashing algorithm without the need for it to happen at once .
when you deal with a mass of participants it’s important to implement mechanism in which the system remains functional without the need for the crowd to act as if it’s a single agent .

1 Like

do you think the stability (lack of variability) of hashing and chunking algorithms in IPFS makes CIDs a one-to-one enough identifier for files ? or we would have a prevalent problem of “one-file many-CIDs” ?

1 Like

As far as I understood we do have the problem of one file many CIDs. I do not think this problem has a general solution though.

However, let me add this. Knowing that the file I am looking for has this particular SHA256 does not solve any problem. If you say to me, “look, the hash of the file you are looking for is xxxxxxxx” I have to trust you that you are pointing me to the right file. There is no way to be sure about that before having downloaded the file and checked its content.

Yet. I agree that having many CIDs for a single file might be an issue in the long run

you won’t need to take the entire file from an alternative CID if the main CID is served , and if not then downloading the entire file before checking the hash is better than not being served at all .

1 Like