Receive a file with an known SHA256 sum from IPFS

Hey guys,

I try to find information on how to calculate the CIDv0 or CIDv1 from a known SHA256 sum, I couldn’t find info on that, other that SHA256 is used in the CID.

I want to be able to receive a file from IPFS without knowing the CIDv0/1 but only the SHA256 sum.

Best regards

I don’t think this is possible. For any given file, it’s possible to construct multiple valid CIDs with different properties.

So there’s no way to calculate the CID from the SHA256 hash from a file? Even if I know the file was just added with the default options with ‘ipfs add file.bin’?

Not that I’m aware of. At least, not unless you want to keep a table of hashes and known CIDs. Someone else might know of a more recent change that would make this possible.

Why aren’t you just using the CID? If you added it, then you know the CID.

I’m thinking about creating a cache program for files which reads a file format which contains an URL and a SHA256 sum.

So if someone else used the program before I don’t have to load the file from the URL but can request it from IPFS.

Since I cannot change the format itself I have to either attached the SHA256 sum to the files which gets added in a way that it’s query-able or I need a way to convert the SHA256 sum to the CID.

I think I’m missing something. Why can’t your program store a URL and CID? I’m missing why the SHA256 is required instead of something that IPFS can natively use.

As a tangent, this sounds related to the experimental urlstore feature. As part of it, you can pass in a URL and get a CID for the file. Then nodes can request the file through your node using the CID.

I think I’m missing something. Why can’t your program store a URL and CID? I’m missing why the SHA256 is required instead of something that IPFS can natively use.

I want to work with publicly available files which just contain the URL and the SHA256 sum.

Sure, I can download a file, calculate the SHA256 sum and then store it into IPFS to get the CID.

But: How do the next client avoid to download the file again to get to the CID when he reads the publicly available file with URL/SHA256 sum? :thinking:

I can store the relationship between SHA256 and CID locally, but I have no idea how to store such a relationship in IPFS in a way that I can request the CID from IPFS when I only know the SHA256.

As a tangent, this sounds related to the experimental urlstore feature. As part of it, you can pass in a URL and get a CID for the file. Then nodes can request the file through your node using the CID.

Correct me when I’m wrong, but:

URLstore is just a feature to avoid storing a file locally when you got a webserver serving this file reliability anyway.

Storing the file not locally doubles the traffic on all nodes, since they first have to download the file from the webserver when it has been requested via IPFS, instead of beeing able to read from the local disk.

Where is the SHA256 hash? Encoded into the URL? Is this just for a specific site or you’re trying to do something for URL caching in general?

Someone else probably has a better idea, but it seems like you need to store a hash of the URL itself (not the file) and CID. Or the whole URL and CID.

However, I think you still have the problem of trying to keep track of which URLs have already been cached along with the corresponding CID. So the next time someone looks up a URL they can

  1. Check if the file has already been cached
  2. Retrieve the file over IPFS using the CID (if #1 passes)

This isn’t something I’m sure how best to do for your use case.

I think I found a solution:

I’ll just create a folder where all files are stored, and the files are just named with the SHA256 hash.

This way I can access the files via an ipns name.

This is obviously a cluster solution, which needs multiple or one computer, downloading the file and storing it in a folder, updating the ipns with the new ipfs. But it works fine for me.

Hello,

I try to find information on how to calculate the CIDv0 or CIDv1 from a known SHA256 sum

A CID is just a SHA-256 hash (or any other hash type), wrapped with some prefixes, so you can perfectly do this, usually programmatically… GitHub - whyrusleeping/elcid: A cid encoder/decoder tool is an example, but it does not have a sha256-hex type, but you can get the idea of how things work (https://github.com/whyrusleeping/elcid/blob/master/main.go#L80).

Anyways, the reason we don’t do this (and therefore there’s no tooling for it) is that IPFS chunks files in 256KB-size chunks (by default). Therefore the SHA256 IPFS hash will usually not correspond to the regular SHA256 of your file.bin (unless that file is less than 256KB and was added as a raw leaf.

Unfortunately, this means for the vast majority of cases there is no way to figure out the IPFS-Cid from the original sha256sum of a file.

2 Likes

Hey @hector,

wouldn’t it make sense to calculate the SHA256 hash of the whole as metadata which can be queried like a CID?

I’m sure there are many usecases where you know a SHA256 sum of a file and just want to receive it over IPFS - Like reading a sha256sums.txt and adding the files all to a folder.

Or my usecase where many people have URLs and the hash sums but want to share the file over IPFS, while you can’t change the file format to add an additionally Hash which is only useful when you’re using IPFS.

Best regards

Ruben

I understand the point, but IPFS is content addressed, that means that the content needs to be self-verified. If the original sha256 hash resolves to IPFS-chunked content that actually has a different hash, you are not anymore in content-addressed-land. If chunks did not exist, and hashes were the same, then it would only be possible to verify a big download after the full download is completed etc.

It guess it would be possible to store the full-file sha256 as metadata (at the expense of hashing twice), but the mapping can be done by the user in a number of ways too (like the idea you had).

All in all, I think this topic has been discussed before and probably many ideas and different reasons have been flown around but cannot find much on it right now.

1 Like

I’ve found a number of tickets on this, after I started this topic.

But it haven’t really lead to anything either.

Thanks for your insight!

see also

this also happened to me while working on a mobile application