Does IPFS pinning announces the top-level CID to the DHT or all its links too?

Does the DHT.provide(cid) and DHT.reprovide(cid) announces to all the near peers only the top-level CID or also all its links?

Example - 1.1MB file:

ipfs add <file> // QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN

ipfs object get QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN | jq                                                                                                                                                       1|0 ok  16s 
{
  "Links": [
    {
      "Name": "",
      "Hash": "QmdCy1sJEpXhggCZkqDeuSqe4sdAiH7aMrPLH2A4YxX2SP",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmQau37aEynAKtAmM6EdGFDu4iXAF6Eyzsu9KyZJ8YxL4G",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmTrVEozuyzBMfDhB7RK45quu1yJjSBvHooQjeAnwn3Y49",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmQQB5TbHy1wHNrwsZ4RiEonfMcaViJEzYWxWys592XzRe",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmSyCK88w52vhWxgerd76TPFAxi9AaHFXAxwvNLu6RLX1Y",
      "Size": 219444
    }
  ],
  "Data": "\b\u0002\u0018��M ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\r"
}

When the file is added, and automatically pinned, the node also announces it to the DHT:

	wg := sync.WaitGroup{}
	for p := range peers {
		wg.Add(1)
		go func(p peer.ID) {
			defer wg.Done()
			logger.Debugf("putProvider(%s, %s)", loggableProviderRecordBytes(keyMH), p)
			err := dht.sendMessage(ctx, p, mes)
			if err != nil {
				logger.Debug(err)
			}
		}(p)
	}
	wg.Wait()

It seems like it only announces the file CID QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN and not its links?

Q1: Does Peer1 announces to DHT the top-level CID QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN or also all its links?

Q2: How does it work when Peer2 is searching for this file?

The Peer2 will execute Bitswap's wantBlocks(QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN) (the top-level file CID) via the WantManager and if Peer1 will answer the request, it will send him the file in the 5 above Blocks format?

Q3: How does it work when Peer3 happens to have part of the file?

Let’s say Peer3 has one of the Blocks QmdCy1sJEpXhggCZkqDeuSqe4sdAiH7aMrPLH2A4YxX2SP (the first link in the original file) - for whatever random reason.

Can Peer2’s Bitswap fetch the 4 blocks from Peer1 and 1 block from Peer3? Or that will not happen because the Peer3 would have to have the entire file and have announced to DHT the top-level CID QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN in the past?

Q1: Does Peer1 announces to DHT the top-level CID QmVCnUnL1w553ae5H1NpX6tfn6CkMvdfqbemWqdmSgGYLN or also all its links?

IIRC everything initially, but over time that depends on configuration. The default is to reprovide all blocks in your blockstore.

Bitswap in go-ipfs has this concept of “sessions” where it first asks peers for data if it thinks they have related data (e.g. from the same DAG download request). So once it finds someone with the root block, it’ll likely ask them for the rest of the blocks too (assuming they have them). Periodically Bitswap will go to the DHT to look for extra providers (e.g. in case more people have a subDAG than the whole DAG).

Bitswap also has a broadcast component where if nobody in a session has a block it’ll ask everyone you’re connected to if they have the block, and if they do add them to the session.

If everyone has announced all blocks to the DHT then Peer two might end up getting all 4 from Peer1 or also get 1 from peer 3 depending on timing.

Note: there’s currently an issue with Bitswap sessions in go-ipfs not really knowing the context of a request. For example, if I’ve downloaded the root block previously I’ll start searching for people who have the child blocks but won’t ask the DHT who has the root block. This means if someone has configured their node to only provide root blocks I might miss them in trying to download the data.

Thanks Adin! Very informative response!!!

I checked the different strategies and now I see the logic:

// SimpleProviders creates the simple provider/reprovider dependencies
func SimpleProviders(reprovideStrategy string, reprovideInterval string) fx.Option {
	reproviderInterval := kReprovideFrequency
	if reprovideInterval != "" {
		dur, err := time.ParseDuration(reprovideInterval)
		if err != nil {
			return fx.Error(err)
		}

		reproviderInterval = dur
	}

	var keyProvider fx.Option
	switch reprovideStrategy {
	case "all":
		fallthrough
	case "":
		keyProvider = fx.Provide(simple.NewBlockstoreProvider)
	case "roots":
		keyProvider = fx.Provide(pinnedProviderStrategy(true))
	case "pinned":
		keyProvider = fx.Provide(pinnedProviderStrategy(false))
	default:
		return fx.Error(fmt.Errorf("unknown reprovider strategy '%s'", reprovideStrategy))
	}

	return fx.Options(
		fx.Provide(ProviderQueue),
		fx.Provide(SimpleProvider),
		keyProvider,
		fx.Provide(SimpleReprovider(reproviderInterval)),
	)
}

Seems like when an object is added/pinned, the DHT announces the root + links immediately.

 ~  ipfs add ./Pictures/morning.png                                                                                                                                                                                                                                                  
added QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk morning.png

15:01:54.340+0100    ^[[35mDEBUG^[[0m        bitswap Bitswap.ProvideWorker.Start     {"ID": 2, "cid": "QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk"}

But the reproviding didn’t start after my re-configured provider interval of 15mins, so I tried to access the CID, and then it started periodically re-providing ALL the pinned blocks.

ipfs object get QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk | jq

{
  "Links": [
    {
      "Name": "",
      "Hash": "Qme8M83J5mr4hmYgA3cDCyXBaBVykWPg9jCJiWUqYRGhvK",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmWD9ai3B7FRV6DAr6DR252LTH2rbLdPnH7wb2Bsy9uYaJ",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmT5toP4Y2KXHxC465uhvXpVuJmoDZNDLiVDu7tjcMY1PF",
      "Size": 245997
    }
  ],
  "Data": "\b\u0002\u0018߁/ ��\u0010 ��\u0010 ߁\u000f"
}

15:24:34.383+0100    ^[[35mDEBUG^[[0m        cmds/http       incoming API request: /object/get?arg=QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk&data-encoding=text&encoding=json&stream-channels=true

At 15:50 it got re-published again:

15:50:42.232+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk", "mh": "bciqkkfzpkkcbidjfkuk242li5usqwcq4ovkdqxgdikwk24x4jvqts6i"}

At 16:21 it got-republished again:

16:21:51.538+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmZT6V41XKKM7crS7c6RDWyCkgSGUAk8zfC9azU8Y1H8nk", "mh": "bciqkkfzpkkcbidjfkuk242li5usqwcq4ovkdqxgdikwk24x4jvqts6i"}

Let’s have a look at the DAG links

First DAG link got surprisingly DHT published in the same time as root CID:

15:01:54.348+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "Qme8M83J5mr4hmYgA3cDCyXBaBVykWPg9jCJiWUqYRGhvK", "mh": "bciqovfbcsgglwanihmffflgk5ad2zal2z4egbgkpfnvlw3nynivwi3a"}

15:11:11.594+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "Qme8M83J5mr4hmYgA3cDCyXBaBVykWPg9jCJiWUqYRGhvK", "mh": "bciqovfbcsgglwanihmffflgk5ad2zal2z4egbgkpfnvlw3nynivwi3a"}

15:56:08.331+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "Qme8M83J5mr4hmYgA3cDCyXBaBVykWPg9jCJiWUqYRGhvK", "mh": "bciqovfbcsgglwanihmffflgk5ad2zal2z4egbgkpfnvlw3nynivwi3a"}

16:25:39.343+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "Qme8M83J5mr4hmYgA3cDCyXBaBVykWPg9jCJiWUqYRGhvK", "mh": "bciqovfbcsgglwanihmffflgk5ad2zal2z4egbgkpfnvlw3nynivwi3a"}

Second DAG link also published in the same time as root CID:

15:01:54.348+0100    ^[[35mDEBUG^[[0m        bitswap Bitswap.ProvideWorker.Start     {"ID": 4, "cid": "QmWD9ai3B7FRV6DAr6DR252LTH2rbLdPnH7wb2Bsy9uYaJ"}

15:18:01.160+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmWD9ai3B7FRV6DAr6DR252LTH2rbLdPnH7wb2Bsy9uYaJ", "mh": "bciqhj4l6x7f6zht4xjt6h66slfgujxytuhwm4twtsjhkhljut3xlcby"}

16:02:09.265+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmWD9ai3B7FRV6DAr6DR252LTH2rbLdPnH7wb2Bsy9uYaJ", "mh": "bciqhj4l6x7f6zht4xjt6h66slfgujxytuhwm4twtsjhkhljut3xlcby"}

16:30:36.201+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmWD9ai3B7FRV6DAr6DR252LTH2rbLdPnH7wb2Bsy9uYaJ", "mh": "bciqhj4l6x7f6zht4xjt6h66slfgujxytuhwm4twtsjhkhljut3xlcby"}

Third, last, DAG link also published in the same time as root CID:

15:01:54.348+0100    ^[[35mDEBUG^[[0m        bitswap Bitswap.ProvideWorker.Start     {"ID": 5, "cid": "QmT5toP4Y2KXHxC465uhvXpVuJmoDZNDLiVDu7tjcMY1PF"}

15:38:47.522+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmT5toP4Y2KXHxC465uhvXpVuJmoDZNDLiVDu7tjcMY1PF", "mh": "bciqenavhxfskg6cjtl5t2kc7uzivhexhsc5lukt3ivbwl75iwtjpmaq"}

16:11:08.877+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmT5toP4Y2KXHxC465uhvXpVuJmoDZNDLiVDu7tjcMY1PF", "mh": "bciqenavhxfskg6cjtl5t2kc7uzivhexhsc5lukt3ivbwl75iwtjpmaq"}

16:36:54.299+0100    ^[[35mDEBUG^[[0m        dht     providing       {"cid": "QmT5toP4Y2KXHxC465uhvXpVuJmoDZNDLiVDu7tjcMY1PF", "mh": "bciqenavhxfskg6cjtl5t2kc7uzivhexhsc5lukt3ivbwl75iwtjpmaq"}

NOTE: Looks a bit irregular but seems to work. Timing could be off due to dht not connected. dialing. some retry mechanism or other session reasons.

I put together these extra debugs steps to understand what’s happening behind the scenes. If anything else, important comes to your mind about the DHT publishing process interval, would be curious to hear @adin.

Thanks for the insights.