File hash is different from original

Hello everyone,

im new to IPFS and testing it to potentially implement it into a blockchain dapp, for now there is something that is quite unclear to me:

when i add a file then retrieve it later through https://ipfs.infura.io/ipfs/QmdZTvj9…, if i download the file and hash it in a sha256 tool, the sha256 hash of the downloaded file is different than the one of the initial file, so the file is modified when uploaded? is there a way to retrieve the exact initial file with same hash?

Thank you in advance.

The “Hash” returned by IPFS is not a hash, but a CID.

You can see the original sha256 hash using this tool https://cid.ipfs.io/

BUT, the files are chunked (by default in 256k blocks) and also, by default, wrapped in a wrapper thing called a UnixFS node. You can see how your file was chunked-wrapped at https://explore.ipld.io/

For things <1MB you can increase the chunking size when you add (so that you get a single chunk), and you can enable --raw-leaves, so that the resulting single chunk is not wrapped. In that case, you will get a CID which embeds a sha256 hash, which corresponds to that returned by the sha256 tool.

1MB is a network contraint. Peers refuse to transfer things bigger than that (it may be 2MB, I can’t remember), so you can use bigger chunks but peers will not be able to download them.

Thanks a lot for your detailled answer, its interesting but actually my concern is not the hash that IPFS return but when i download the file itself, how come the sha256 hash of the downloaded file is different than original? it mean the file was modified in the process?

ah, ok, I understand now. Well, I don’t know why it’s different. It should not be.

Can you check the differences or diff between the two somehow?

Thanks a lot for your help, here my test:

import IPFS from "ipfs-mini"; 
import { sha256 } from "js-sha256";
var buffer = require("buffer");

const ipsf = new IPFS({
  host: "ipfs.infura.io",
  port: 5001,
  protocol: "https"
});

function upload() {
  const reader = new FileReader();
  reader.onloadend = function() {
    const ipfs = new IPFS({
      host: "ipfs.infura.io",
      port: 5001,
      protocol: "https"
    });
    const buf = buffer.Buffer(reader.result); // Convert data into buffer
    console.log(sha256(buf));
    ipfs.add(buf).then(resAddFile => {
       ipfs.addJSON({ file_name: "mon_fichier.pdf", ipfs_file_hash: "https://ipfs.infura.io/ipfs/"+resAddFile }).then(resAddJson => { // or https://gateway.ipfs.io/ipfs/
        console.log(resAddFile);
        ipfs
          .catJSON(resAddJson)
          .then(resCatJson => {
            document.getElementById("url").innerHTML = resCatJson.file_name;
            document.getElementById("url").href = resCatJson.ipfs_file_hash;
            console.log(resCatJson);
          })
          .catch(err => {
            console.log(err)
          });

    }).catch(err => {
        console.error(err);
    });

    }).catch(err => {
        console.error(err);
    });

  };
  const photo = document.getElementById("photo");
  reader.readAsArrayBuffer(photo.files[0]); // Read Provided File

}
document.getElementById("uploadImage").addEventListener("click", upload);

For info the console.log(sha256(buf)); give me the proper hash which is 9bc7f6263686a03017d2b59f70f548d6b037ea305237c63e3adf2bd47b28d91f but the file downloaded from ipfs give me 880b7c72db8212320e4d347f3c71c76ec4ac70ada098da5712369c5ab78feb35

are you sure you are hashing what you think you are hashing? Try with a small text file and compare with other sha256 calculator.

Im quite sure because i check the hash of what i add on ipfs, before adding it and its the same hash as when i hash the same file in online tools. Then i later download the file from IPFS, hash it in same online tools and its a different hash. I made a small video to visualise. I notice that when i download the IPFS file then open it, for some reasons its asking me if i want save the changes, when i didnt do any changes.

Here the example: https://youtu.be/T_-9SzcwhZI

It seem that the file end up not fully uploaded, for exemple i test with this PDF file on the left, and IPFS end up with this one on the right, where the background is missing:

Any idea what is wrong? the buffer i pass in ipsf.add is correct because it have the same hash as initial file.

I’m running some tests and the infura gateway tends to stall and close the connection pretty early, but once I fully download my file, the hash was the expected one. Switch to the ipfs.io and test again. Also, try with a smaller, text file to verify, and potentially check what sort of corruption is happening (i.e. truncation or something else).

Thank you for your time, here an example that show the file is truncated or incomplete:

I just tested with a test.txt file with just hello! inside (still using infura host) and the hash match correctly.

If i switch the host: “ipfs.infura.io” with host: “ipfs.io” im getting a ERR_CONNECTION_TIMED_OUT, what is the correct host please?

Any of https://ipfs.github.io/public-gateway-checker/ . But sometimes gateways (any gateway), may timeout trying to find the content.

However, ERR_CONNECTION_TIMED_OUT is not something I would expect, as that means that the gateway is not even reachable from your location.

gateway-pinata-cloud give me ERR_CONNECTION_REFUSED
gateway-ipfs-io give me ERR_CONNECTION_TIMED_OUT
cloudflare-ipfs-com give me ERR_CONNECTION_TIMED_OUT
ipfs-eternum-io give me ERR_CONNECTION_TIMED_OUT

Basically only infura respond but seem to close before the end of the add (but still give me a hash) and i end up with a truncated file. My internet seem to work correctly. Maybe there is a blacklist for Philippines ip? :frowning:

That might be the case… is website blocking common practice there?

I faced that few time for online payment but for something like file sharing its the first time, will test tomorow from my digital ocean hosting since the droplet is from another country and see if it work, i was testing locally from Philippines, i’ll keep updated. Thanks for your time.

Hum well actually it might not change anything since its client side:/

So i changed my code a bit while using a new lib, ipfs-http-client vs ipfs-mini i was using yesterday, now things seem to work well and final file have the correct hash, still, im unable to use any of the ipfs gateway checker list host except infura…

here the current code:

import { sha256 } from "js-sha256";
import ipfsClient from "ipfs-http-client";
import "regenerator-runtime/runtime";
var buffer = require("buffer");

const ipfs = ipfsClient({
  host: "ipfs.infura.io",
  port: "5001",
  protocol: "https"
});

let progress_func = function(len) {
  console.log("File progress:", len)
};

const addDile = async function(buffer) {
  for await (const result of ipfs.add(buffer, {progress: progress_func})) {
    console.log(result);
  }
}

function upload() {
  const reader = new FileReader();
  reader.onloadend = function() {
    const buf = buffer.Buffer(reader.result); // Convert data into buffer
    const files = [
      {
        content: buf
      }
    ];
    console.log(sha256(buf));
    addDile(files);
  };
  const photo = document.getElementById("photo");
  reader.readAsArrayBuffer(photo.files[0]); // Read Provided File
}

document.getElementById("uploadImage").addEventListener("click", upload);

I here have another problem with the progress not showing untill the end but that off topic for now.

Would still love to understand why i cant use any other host :confused: is it the right place to ask here?

I can’t debug from your location. It seems you cannot connect. You should try tracerouting, verifying DNS is correct etc to try to diagnose the problem, but seems essentially a network problem.

Thanks for your reply, not sure how to traceroute that to be honest but i did put a quick test online where we can change the host for easy testing.

Would you mind try this: https://test-ipfs-ph.herokuapp.com/

With infura, with some other public gateway host and let me know if work for you? On my side only infura work.

Sorry i didnt put any visual response so must also open the console when testing.

Ps: Some friends from France tried and same, only infura works.

The hashes are different because you’re comparing the hash of the file data vs the hash of the DAG that represents the file.

There’s a great blog post about it here: https://medium.com/textileio/whats-really-happening-when-you-add-a-file-to-ipfs-ae3b8b5e4b0f

The gateways you’re trying to connect to are supposed to be used to download content from IPFS via HTTP, not as programatic APIs.

So, for example when I paste “gateway.pinata.cloud” into the gateway box on your heroku demo app, everything times out, but I can fetch IPFS content from that gateway by visiting the url https://gateway.pinata.cloud/ipfs/Qmfoo where Qmfoo is a hash I want to download.