Please! Help me understand how ipfs dag storage block works

I am testing ipfs.dag API today, so I started a ipfs instance from Node and go ipfs instance from terminal.

Node.js side

    const obj = {"simple":"object"}
    const cid = await ipfs.dag.put(obj, { pin: true })
    const res = cid.toBaseEncodedString('base58btc')
    console.log(`res is ${res}`)

I noted down the res value: zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv

From go ipfs side, I type:

➜  client-demo (master) ✗ ipfs dag get zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv
{"simple":"object"}

Everything works!

However, the command ipfs dag get zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv returns result after a long time or seems stuck when I do below steps:

  1. Stop Node.js thread (it has run for several minutes)
  2. Execute ipfs repo gc

The weird thing I observed is when I changed the obj to below and repeat above steps, the command seems never return

const obj = {
      name: 'hello',
      value: 'world',
      next:['zdpuAw1nQw3j8YoCDpSJRLCJHzQZoAHUmtN5Xib3AMyTDj3H8']
    }

One more thing I observed was that the ipfs dag get command returns immediately when I restarted Node.js process. I guess the two ipfs node gets connected and then go ipfs instance found the data.

So, the things confuses me is that it seems the pin doesn’t work because from my current knowledge of ipfs, it should not rely on the original ipfs node and the data should be there stored in ipfs for other nodes to fetch. Is there anything else I missing for dag object stored in ipfs besides passing pin to true? Or It just slow and I should keep node.js running for a reasonable time for pinning the data?

is ipfs.dag a client to go ipfs or a js-ipfs instance? If it’s the latter, it would be getting pinned on js-ipfs, not in the Go one. If you stop the original provider after GC-ing in go-ipfs, it is normal that it becomes unavailable.

1 Like

The client is go ipfs. So, it is still available after GC if I use a js-ipfs client?

Why programming language of client matters?

It would be like two different implementations of BitTorrent running on a machine, when no other seeders have started picking up the parts of files yet. If you shut down the instance that is hosting data, then the other instance can’t see the data because it’s a totally separate instance, and the only one having data is offline. Remember each instance you have holds the data. IPFS is not a big DB in the cloud. It’s local and distributed like BitTorrent is.

1 Like

@hector

One more question

I read below section today from IPFS Tutorial | Regular Files API (Lesson 3) | ProtoSchool

When you add a file to IPFS, you’re putting it only in your own node, but making it accessible by peers on the network. It will only remain available as long as someone who has it is connected to the network. If no one else has found and shared your file yet, and you shut off your computer or stop your IPFS daemon from running, that content will no longer be available for anyone to discover. The more people who share your content, through a process called pinning, the more likely it is to be available at any one time.

So, let’s say I use ipfs.add command to add a file, then in order to make it available at all the time, What I need to do? My guess is to have as many as machines to use ipfs.cat to get the file? If my guess is right, I think I also need to pin the file on the machine when it gets the file content, right?

Make sure it is available in as many IPFS nodes as you can or use a pinning service like Pinata.

Thanks for explanation. I got it.

I have a new question: Let’s say I have built a system/application based on ipfs, so I can control every node of my system. What’s the efficient way to make data available in many IPFS node? Just using commands like ipfs.add and ipfs.cat seems inefficient since this way every node would store entire file for each other. What I want to achieve is something like storing small pieces of data on many nodes for one file, for example like using DHT algorithm to store data, so is there any api I can use? Or I must implement my own algorithm to do so?

@hector Hector, I am looking forward to your advice on this. Thanks a lot !

English is not my native language! Please forgive me if I make some grammar mistakes or make confusion.

If the problem is that you don’t want data to be in too many nodes, but you want to use it in those nodes, then do not run so many IPFS nodes and just fetch it from a remote node via api/gateway.

Let me rephrase my problem:
I am developing a desktop application for collaborative text editing based on ipfs. When user launches the desktop application, it will start an ipfs node.

I’ve designed an algorithm to elect super node for a bunch of running application instances. The super node will choose proper nodes to save data for user when user save a text file. This is a decentralized design that data will be saved on nodes of my system.

So, I don’t want to fetch the data from a centralized remote node (please correct me if I understand this way incorrectly)

What I want is that:

     const nodes; //nodes are connected
     const data_segments; //an array of data segments, maybe like the Merkle DAG
     replicate(nodes, data_segments) // datas will be sent to nodes 

Use pubsub to send the things that the nodes need to replicate. Let the nodes subscribe to pubsub for things to replicate.

In pubsub you can either send CIDs for large things, or directly the data that they need to replicate for smaller parts.