How does merkle trees help in effecient verification in a system like IPFS?

I do not understand how Merkle tree can be used to efficiently verify data in a peer to peer, distributed network like IPFS. I mean I understand how the merkle tree is constructed: basically take a data, break into pieces and continuously hash pairs at a time until you get the Merkle root.

But it is said this Merkle root some how makes verification of data more effecient and this I do not get.

For example let’s say Node A receives pieces of data from multiple other places before piecing it together. How does Merkle trees and Merkle roots help in verifying the node got the right pieces?

  1. How does the node even retrieve the Merkle root to compare with?
  2. Okay let us assume the node got a Merkle root from a trusted source, how does having the merkle root and being able to compute a Merkle tree prevent it from having to hash all the pieces to get its own copy of the Merkle root it can use in comparison? I usually read that the node does not need all the trees but a sub-tree, but I really do not get how that works.

Anyone mind explaining? Or pointing to a well explained piece that does not just describe how Merkle trees are created (almost everyone does this) but explain how, in what ways it is actually used. And why it works.

It sounds like you might be mixing a couple of things.

First, I think you talk about “proof of inclusion”, as enabled by Merkle Trees. This is a way to verify that something belongs in a Merkle tree by getting a root and a path to something from that root (with hashes from siblings etc). Essentially you can prove that something is in the tree without having to download the full tree. That is neat for some blockchain applications but not relevant for IPFS.

First IPFS uses Merkle-DAGs (not trees, per se). The relevant part is that IPFS is a content-addressed system, and a merkle-dag every node is content-addressed by their parents. Essentially, if you want to link content in a content-addressed system, the result is a Merkle DAG.

IPFS does not need to prove that something is in a Merkle-DAG. It just needs to 1) resolve the CID/hash of something - sometimes traversing such DAG, 2) Verify that the data returned for that CID matches it.

All pieces of data are addressed by CID, so they can be verified individually (there is no need to piece parts together and do verification of something afterwards).

This indeed seems to have been the case. Thanks for pointing this out.

Basically in IPFS the Merkle-DAG is self validating. Since the requested CID is some sort of Merkle root anyways.

That means If a node requests for data with the CID, it can easily confirm, by computing the Merkle root (aka CID) that it got all the pieces right. Will this be another way to put it?

Yes, that sounds right.

1 Like