Understanding CID

I am trying to figure out how cid are generated and how we can extract the info as cid.ipfs.io do but I have a hard time :slight_smile:

I got a cid in base32 : bafybeidyxh2cyiwdzczgbn4bk6g2gfi6qiamoqogw5bxxl5p6wu57g2ae
When converted to binary I get this :

00001000 00001011 10000000 10010001 00000011 11000101 11001111 10100001 01100001 00010110 00011110 01000101 10010011 00000101 10111100 00001010 10111100 01101101 00011000 10101000 11110100 00010000 00000110 00111010 00001110 00110101 10111010 00011011 11011101 01111101 01111111 10101101 01001110 11111100 11011010 00000001

on https://cid.ipfs.io/#bafybeidyxh2cyiwdzczgbn4bk6g2gfi6qiamoqogw5bxxl5p6wu57g2ahy I get the following info

HUMAN READABLE CID
base32 - cidv1 - dag-pb - sha2-256-256-78b9f42c22c3c8b260b781578da3151e8200c741c6b7437bafaff5a9df9b403e

MULTIBASE
code: b
name: base32

MULTICODEC
code: 0x70
name: dag-pb

MULTIHASH
code: 18
name: sha2-256
bits: 256

the multihash extracted in hex form is stated to be 78b9f42c22c3c8b260b781578da3151e8200c741c6b7437bafaff5a9df9b403e

in binary form it is

01111000 10111001 11110100 00101100 00100010 11000011 11001000 10110010 01100000 10110111 10000001 01010111 10001101 10100011 00010101 00011110 10000010 00000000 11000111 01000001 11000110 10110111 01000011 01111011 10101111 10101111 11110101 10101001 11011111 10011011 01000000 00111110

But that string does not exist in the original binary form

On the other hand I can find the substring 01111000101110011111010000101100001000101100001111001000101100100110000010110111100000010101011110001101101000110001010100011110100000100000000011000111010000011100011010110111010000110111101110101111101011111111010110101001110111111001101101000000001

where the last 11110 is cut out.

How is the hash extracted exactly ?

Where is the version cidv1 included ? Is it implicit ?
How is 0x70 extracted ?
Could someone explain each part of the cid?

I’d like to generate the cid base32 string from the content hash and the other information

I have another example bafybeiczsscdsbs7ffqz55asqdf3smv6klcw3gofszvwlyarci47bgf354

Note that both of these example are folder dag-pb

Thanks

This CID should be bafybeidyxh2cyiwdzczgbn4bk6g2gfi6qiamoqogw5bxxl5p6wu57g2ahy, right?

Also note, the leading b is a multibase prefix and isn’t part of the encoded data. When reading a CID, we:

  1. Chop off the first character.
  2. Look it up in the multibase table (GitHub - multiformats/multibase: Self identifying base encodings).
  3. Decode the rest.

Where is the version cidv1 included ? Is it implicit ?

When decoded correctly, the first byte should be 0x1.

How is 0x70 extracted ?

That’s the second byte. Well, technically, all these numbers are base128 varints but anything less than 128 will fit in a single byte.

Check out GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems.

cc @olizilla

1 Like

Hi @stebalien

You right it is bafybeidyxh2cyiwdzczgbn4bk6g2gfi6qiamoqogw5bxxl5p6wu57g2ahy
The tools I used to convert converted it back in the wrong format (https://cryptii.com/pipes/hex-to-base32)

  1. Chop off the first character.

Ha ok! you chop of the first 5bit only then! I though I had to chop the whole byte

Thank you!

Specifically, you chop off the first symbol. We can’t really talk about bits/bytes at that point because we’re working with text and multibase works regardless of the encoding.

1 Like