Generate file hash in golang with go-cid

Hello everyone,

I am trying to produce the same file hashes as IPFS using package go-cid, here’s my code:

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"os"

	cid "github.com/ipfs/go-cid"
)

func main() {

	var builder cid.V0Builder

	f, _ := os.Open("/tmp/b.txt")
	r := bufio.NewReader(f)

	var msg, _ = ioutil.ReadAll(r)

	c, _ := cid.V0Builder.Sum(builder, msg)
	fmt.Println("Created CID: ", c.Hash().B58String())

	f.Close()
}

The file content of b.txt is:

Hello World!

My issue is that:

  • The code outputs QmNbCYUrvaVfy6w9b5W3SVTP2newPK5FoeY37QurUEUydH
  • ipfs add /tmp/b.txt outputs QmfM2r8seH2GiRaC4esTjeraXEachRt8ZsSeGaWTPLyMoG

Do you know why I don’t get the same hashes?

Thank you very much in advance,

1 Like

Because CID aren’t the hash of the file, they are the hash of the root block.

First the file is chunked, each chunk is wrapped in a block called a leaf.
Each leaf is hashed and have it’s own CID.
Then all of thoses leafs are linked together until saturation in a block called a root, if your file is big enough you will have multiple roots because one couldn’t fit all the links in one single root.
All thoses roots are also hashed and have their own CID.

This linking process is repeated linking multiple roots together if needed (like if your file is big enough and have 3 roots, well a new root linking the 3 underlying root is made).

Once you only have 1 root his CID is what IPFS shows at the CID of the whole file.

BTW, in case this isn’t clear, a root is just a list of CID, all the content of all the links appended together is the content of the root.

The docs assumes that the CID is the hash of the file, because security wise it mostly is, altho in reality it’s a chain of hash.

Ok thank you!!

I just didnt expect any chunking to happen for a Hello World! file ahah,

Thank you again!

“Hello world” isn’t chuked :smiley: you are correct, but it’s still wrapped into a unix fs serialized object (the leafs aren’t plain blobs, actually the roots and leafs are the same object, but a leaf doesn’t contain any further links and have a data field (a byte blob) filled, a root contain at least 1 link and maybe have some data which in this case is appended after each link).

2 Likes