How to check is the given ipfs_hash and its linked hashes already downloaded or not?

Let’s say I hava a ipfs_hash. Before doing ipfs get ipfs_hash, I want to check that is it already downloaded. If it is a folder I want to check also its sub-folders/files hashes as well.

Currently I am doing as: ipfs refs local | grep <ipfs_hash>. Is it correct way to do it or is there any alternative solution?

=> Can I run ipfs ls on my local machine?

you can check also pin command but i’m not sure : https://docs.ipfs.io/guides/concepts/pinning/

If its pinned I get following output

❯ ipfs pin ls QmRasf4XRJaMNMJnXTuEPRUJmHMcwcwa5VTYBPazRuq42C
QmRasf4XRJaMNMJnXTuEPRUJmHMcwcwa5VTYBPazRuq42C recursive

Lets say its not pinned but its subfiles could be pinned. Is it also possible to check subfiles via only obtaining their hashes, without downloading the complete files?

You can get the refs for a folder by ipfs refs <CID> and compare them to your full pin list.

If you wanna do this with a file (and the file blocks) you can do this as well… with the blockstore subcommand…

The question is, what’s your usecase? I mean, just out of curiosity - if it makes sense that more people might want to use IPFS that way, you might want to request feature for this. :slight_smile:

When I do ipfs refs -r <CID> does it fetch any data from the network?

So basically I will return ipfs refs <CID> into an list and iterate over all by checking ipfs pin ls -t all <CID> (I am using type as all)and found ones are considered as locally exist.

=> Can you give an exampled related to blockstore check?

Actually I request it as a feature but it considered as not high priority issue and ignored. (Is possible to find out new added block numbers into local repo after each “ipfs add file”? · Issue #5826 · ipfs/kubo · GitHub). I am nervous so re-ask its progress.

Yes. When the CID is not available in the local data storage, it will be fetched from the network.

You can avoid this by stopping the daemon and run this ‘offline’ on the local repo. Than a refs will just terminate with

Error: merkledag: not found

Sure, but this also works just ‘offline’ on a repo. If you run ipfs block stat <CID> it will terminate with

Error: blockservice: key not found

if the key isn’t locally stored.

I want to check it without fetching any data from the network. I mean some meta data could be fetch but not the complete data itself.

Terminating daemon would be bad if in parallel some other programs are using IPFS as well, that will crash their process.

Can I run two independemt Ipfs processes? one is ofline and other one is online? So I can make this check on the offline ipfs process or is it possible to force ipfs refs to run on the offline mode.

I understand this, but this is currently not possible.

Can you give a concrete example, where it’s necessary to ask for this informations?

Well, the reason for this limitation is, for IPFS is firstly everything one thing: blocks. So there’s not a dedicated ‘metadata’ vs ‘data’ separation on the network side.

If you do a ipfs refs <CID there will be just the blocks fetched which are necessary to answer this ‘question’. If the CID belongs to a file, it’s likely that only the first block (default block size is 256 KByte) will be fetched. If the CID belongs to a directory, the size might be up to 1 MB at the moment, depending on the amount of files stored in the directory. If you have sharding enabled, the directory will be split in multiple blocks, and thus might be larger than 1 MB.

Note: A directory is usually pretty small.

As an example:

$ ipfs files ls /exampledir | wc -l
10225
$ ipfs block stat QmRa6nnXCu9BCMZ3PmSpd4aKrry35otbwWR8umK3H994Fo
Key: QmRa6nnXCu9BCMZ3PmSpd4aKrry35otbwWR8umK3H994Fo
Size: 857028

True. It also takes a while until IPFS gets back its network connections.

Yes, but not on the same block store/datastore. If a process is already running, it will create a lock file.

Got it, if the fetch data for ipfs refs is around 1 MB, it is reasonable. So it won’t fetch the complete data blocks like when we do ipfs ls <CID> right?

Hello,

you can always run ipfs --offline refs -r or ipfs --offline block stat etc even if your normal daemon is running. With that you can check if something is available locally or not.

1 Like

Thank you @hector, that it what I was looking all along. I can also do ipfs --offline ls right?

Let’s assume I have only the parent-hash and ipfs --offline refs -r returns False for it.

Now I want to check its hashes that are returned from ipfs refs -r one by one. When I do ipfs refs -r ipfs won’t download the full data itself right? instead it should downloads small amount of data only do obtain results for ipfs refs -r or ipfs object stat right?

Depends what data. IPFS can traverse arbitrary DAGs. It is not the same to to refs -r on a file added by default, vs a file added with trickle dag, vs a file added with raw leaves vs some random IPLD structure. In general, in order to list the children of a DAG, you need to download that DAG node.

In the normal case, those DAG nodes which are not leaves will be relatively small. But I cannot ensure they are not downloaded with refs -r.

If you want to go little by little you can do ipfs refs (without the -r) and traverse manually. Does that help?

I have observe that when I run

ipfs object stat <ipfs_hash>

and than ipfs --offline block stat <ipfs_hash> always return valid information (does not give an error) even the hash is not downloaded. How can I resolve this?