What to do in case of hash collision?

From @Piruzzolo on Tue Aug 25 2015 18:37:45 GMT+0000 (UTC)

I was wondering how to detect and solve hash collisions… Does IPFS include systems to detect that?


Copied from original issue: https://github.com/ipfs/faq/issues/24

From @whyrusleeping on Tue Aug 25 2015 19:04:59 GMT+0000 (UTC)

ipfs uses a sha256 hash for addressing content. Meaning that there are 2^256 different possible hashes. Lets assume that the entire bitcoin mining economy decides to try and find an ipfs object hash collision, checking hashes at a rate of 400 Petahash (400,000,000,000,000,000 hashes per second) it would take them 2.810^59 seconds, or 910^51 years to compute the entire space. Even factoring in the birthday paradox and moores law, we shouldnt need to worry about that happening for a few thousand years or so. Assuming sha256 is a perfect hash function (which it might not be).

If sometime down the road, we get worried about our very limited hash space, we can upgrade the hash function we use as ipfs uses the multihash format for specifying hashes along with their type and length.

tldr, the world will likely end before we find a sha256 collision. but if we get worried, we can use a bigger hash at pretty much any time.

From @jbenet on Tue Aug 25 2015 21:51:01 GMT+0000 (UTC)

in general, what @whyrusleeping said is accurate. Consider: the bitcoin network has not seen sha256 collisions. (People may not be wasting their resources finding a pre-image for an ipfs object, when they could just steal all the bitcoin instead – it’s the same problem).

The one future-proofing caveat is that many cryptographic hash functions have been broken over time, and that’s one reason – as @whyrusleeping mentioned – we use multihash: we can upgrade. (this is a costly thing though, as things will have to be rehashed and/or linked to, so we ought to think about how to upgrade well before we do… good thing we have a while :slight_smile: ).

also, feel free to switch to sha3, you can recompile go-ipfs to do it. the rest of the ipfs network supports it already. (and we’ll add blake2 support too)

and, cryptocurrency people may want to build upgradeability into their blockchains. see https://github.com/jbenet/multihash and related protocols.

From @jbenet on Tue Aug 25 2015 21:59:44 GMT+0000 (UTC)

Though, a word of warning for the ages. Even the best people do make mistakes thinking that certain cryptographic artifacts are safe for a long time:

(HT Zooko for these links)

So we should not rest on our laurels, and improve the upgradeability paths of our cryptographic protocols, to make easy to ratchet up the security of a system. Multihash and related protocols are a good start, but not the end by any means.

I’ll close this, as the better question (to be asked elsewhere) is “how do we upgrade from a broken cryptographic hash function”.

From @alugarius on Mon Sep 26 2016 19:10:38 GMT+0000 (UTC)

Thanks @RichardLitt
so lets make it short…

“and, cryptocurrency people may want to build upgradeability into their blockchains. see https://github.com/jbenet/multihash and related protocols” – @jbenet

how can this be applied to IPFS?
or how can the hash be extended?

From @tomachinz on Thu Mar 30 2017 07:12:03 GMT+0000 (UTC)

ipfs currently uses a ‘multihash’ where every hash value is tagged with the hash function used to generate it. As of now, the default hash function is sha256, and if sha256 is shown to be broken, aparently its possible to switch to sha512 on the fly - lengthen our root address hash function to something like sha512 without breaking any existing data.

I’m not concerned about finding another object with the same hash (which is in case of encryption is used to make sure passwords can’t be generated from hash salts),but since hash is an identity of a file here, nothing is stopping you from find 2 files with the same hash, and hence the same identity on IPFS.
(you didn’t try to find such a file, you just encountered it)

so this could be a threat to such a system’s resilience at scale.

How does IPFS deal with this?

2 Likes

Yes, of course for finance the hash is of great significance but the concern is one of more or less immediate security and preventing attack. From a preservation standpoint however, the value at stake is more than simply providing ongoing financial security. It is the existence of likely irreplaceable items, and since preservation entails keeping the item intact and accessible for eternity plus one day, that set of conditions affords much more opportunity for hash collisions to occur and to be of greater concern. Even so, I am eager to see IPFS-based preservation solutions.

1 Like

Let’s ignore malicious attacks and possible weaknesses in SHA256. If IPFS aims to be planet scale, it should have a way to handle accidental collisions. What happens when there is a collision? How is it identified and handled? Can we switch the colliding objects to a different hash function?

1 Like