I have a combination question/suggestion. I’ve been wondering why the Merkle tree root doesn’t contain a hash of the content. It seems to be the source of some confusion and I can think of some situations where it would be nice to have. The confusion seems to be because the CID is a hash of the contents but only indirectly via the Merkle tree so without the entire tree there is no way to verify the hash of the complete file. The other is that the same file can hash to different values depending on various parameters like the chunker used. Unfortunately that means that I can add the same file twice and get two different hashes. That is what it is, but there isn’t any way to verify the they actually are the same file without retrieving both files, hashing the contents and comparing the hashes.
I get why you wouldn’t want to do that because recomputing the entire hash over and over again would get expensive but I could imagine that it might be possible using something like lthash https://github.com/lukechampine/lthash
If you had that you could determine if two files were the same by just retrieving the root node and comparing lthashes. If gateways gave you a way to retrieve node as well as files you could also retrieve a file’s root node verify the CID and compare the lthash and then you wouldn’t need to trust the gateway.
Maybe I’m completely missing something but I thought I’d throw it out there and if I’m missing something maybe someone would be kind enough to let me know what I’m missing.