File hash is different from original

Thank you for your reply, no i was comparing the hash of both files, the file before i add it then the file out, after downloading it from here: https://ipfs.infura.io/ipfs/QmTCP7Ln1PL

The hash of the downloaded file, when i hash it in a sha256 tools was different than the hash of the initial file. Figured out the file was incomplete after the add, for exemple it missed the background compare to the initial PDF file… no idea why.

Then i changed lib, from ipfs-mini to ipfs-http-client and its working ok now, both files, in and out have same hash.

Ok thanks for the info, i though all these gateway were public and could be used to add file like with ipfs.infura.io, so there is only one public gateway/point of entry allowing add? Best practice is to setup my own gateway?

I’m trying to check if a file exist on ipfs and I’m having the same issue where the hash of original file does not match a downloaded copy. And I’m using a very similar set-up to @crashbdx. Environment: react/nodjs, ipfs-http-client, crypto npm packages.

Here’s the operations in order that I’m running…

  1. A PDF is created on the fly using html-pdf npm plugin
  2. Save file to buffer
  3. Upload file to ipfs.infura.io using ipfs.add() from buffer
  4. Save the returned CID
  5. Get file from ipfs with ipfs.object.get(path)
  6. Save file.data to buffer
  7. Convert JSON.stringify(buffer)
  8. Hash JSON.stringify(buffer) with crypto(sha256)
  9. Save the digest.

Now, when I visit the file in the URL and save the PDF to my local machine and run it through the same crypto(sha256), I get a completely different hash. However, when I run a test to ipfs.add(pdf) from my machine, it returns a matching CID.

Does anyone know why this is happening and how to fix it? Ideally I’d like to check if a file exists in IPFS without having to add it, that’s what I’m ultimately trying to do with the hash comparison.

Any help is greatly appreciated.

Thank you,
~Dan

If you take the buffer in step 2 and run it through the crypto(sha256), do you get the CID returned in 4? If so, then the retrieval is changing the content (unlikely). If you get a different hash than the returned CID, then CIDs aren’t just the sha256 hash of the content. (And I don’t think they are, actually).

I might be wrong on this, but I believe the returned CID from ipfs.add is the CID of a protobuf wrapping meta-data about the file (like its name). That block points to the actual file contents block which would have a completely different CID, which may or may not match the result of your hash, but has a better chance of a match IMHO.

Remember that IPFS does “chunking” as well, so a straight hash of a buffer might only work for smaller than chunk-sized files. You’d really need to do your own content chunking and hashing of each chunk to get a CID that you can check for on IPFS. And even if all of the chunked CIDs exist, it doesn’t necessarily mean that the file you’re starting with is on IPFS, only that the chunks that make up the file are there, possibly as chunks of other files.

At least, that’s my understanding of the internals of IPFS from lots and lots of reading.

Try doing an ipfs object get on the CID you get back from the add. I suspect you’ll find that it is also a protobuf of links and data where the data is the file contents. I also suspect that only raw blocks added and retrieved with the block API would be usable given your approach.

You might want to read the original answer to the thread that you’ve picked up on. @hector says the same thing I did, but better.

1 Like

Hi thank you for your response.

I’m not sure if I’m understanding this correctly, but taking step 2 and running it through crypto(sha256) will give me a different result then the CID is step 4. From my understanding IPFS CID is not the digest of a sha256 encryption.

However, I didn’t know about the “protobuf wrapping meta-data” so I’ll have a look at that in combination with the https://cid.ipfs.io/ tool.

You would think that there would be a simple solution to check the existence of a file in IPFS, can’t believe it’s this difficult.

Look at the arguments for ipfs add, in particular -n --only-hash: “Only chunk and hash - do not write to disk”. Might that give you a CID without actually adding the file to the swarm?

I added a file called “peers.txt” with a directory wrapper. That gave me CID QmVpAYxVUvBDSakUjNcF1Dv7dGVUtKpMv9SFgApxmraGhx.

An ipfs object get of that CID gives a content hash of Qma2kSdx4uh8VKzb8p8dqqBzDhoSFYVQBsYgHQTeCPbHWD.

Doing an ipfs add --only-hash peers.txt gives me that same content hash.

So, if you do an ipfs add without the --wrap-with-directory, the CID you get from that add should match the CID you get from an add --only-hash of the same file. So, to see if a file is in IPFS, you’d do an add --only-hash and then try to retrieve that CID from IPFS as a file, as an object, or even just as a block. If any of those give you something for the CID, then I believe you can assume the file is in IPFS.

You might even try an ipfs dht findprovs, but that particular query seems to take longer.

Thank you @ldeffenb for the pointing me in the right direction.

I found this ipfs-only-hash plugin which calculates the IPFS hash for some data without having to install or run an IPFS node. Perfect for comparing files!