Find Content Faster - Willing to pay anyone who can solve this for me :)

Hey all -

I’m new to IPFS, so please excuse my ignorance. I’m working on a tool involving NFT metadata. In order for the tool to work well, I need to be able to pull metadata very quickly after the URI is released. I am currently using the IPFS CL “get” command in order to pull an entire URI directory - but it often takes over a minute for IPFS to find a node with the data available.

There are some tools online however that can get the data from IPFS almost instantly (within seconds of the URI being released) - how are they doing this? For example, even if I use one of the public gateways at Public Gateway Checker | IPFS, they almost always have the content available right away. What do I need to do to accomplish this on my own IPFS node? Do I need to configure my own swarm so that I have more peers available to search from?

Any help is greatly appreciated. I would happily pay anyone who is able to give me a solution to this in the form of an Amazon giftcard or crypto. Thanks!!

The (xor) distance between a node and another with data makes all the difference in the world.

Transferring data between peers is fast, discovery not so much.

Also, a theory of mine is that nodes who run for long time tend to connect to each other since temporary node come and go.

IDK how those other tools work, but be careful comparing, sometime a system can be centralised.

Thanks for your response! I have talked to the devs for one of the services I mentioned and he confirmed that they use IPFS to pull the data.

I just came across this post, and it sounds like he is encountering a similar issue.

As an example, this CID is available on the given peer. I have added this peer via ipfs swarm connect and I’ve also added it in the peering config file. However, when I run ipfs pin add or ipfs get, it still takes over a minute to discover the data. How is this possible if I am directly connected to a peer which I know has it available?

@clabowman as mentioned in the other post you need to figure out who is hosting your content.

QmcfgsJsMtx6qJb74akCw1M24X1zFwgGo11h1cuhwQjtJP is Cloudflare, and unless I’m mistaken the data you are looking for is not being hosted by Cloudflare since IIUC they just run a gateway :upside_down_face:.

So:

  1. Figure out who has your content (where did you get this CID from and who is supposed to have it, if the publisher of some NFT didn’t publish their CID correct, e.g. their computer with the data is offline, you’re going to have some trouble)
  2. Use ipfs-check to figure out if they have and are advertising it
  3. Try connecting to them and downloading (particularly if you’re planning on peering with them directly and they haven’t advertised the data in the DHT … which will cause problems for many users trying to find the content)
  4. If the data is a directory with many files you can try optimizing by instead of doing ipfs get you can do ipfs pin add followed by ipfs get --archive followed by untarring the data suggested as an optimization in the other thread - this generally isn’t necessary though

If you’re having trouble afterwards report back :grinning:. Remember that data get cached locally so if you’re truly trying to test speed you’ll want to make sure to unpin the data and run ipfs repo gc before trying to pin it again.


In theory you could try connecting to the list of nodes on docs.ipfs.io designed for gateway usage referenced in that other thread (Peering with content providers | IPFS Docs) … but it’s not really a winning strategy since that list could get updated from time to time and you’d be out of luck. Also if enough people do this and try to abuse connections to infrastructure nodes any of those infra providers could start blocking you.

That being said if some of those content providers aren’t properly advertising their data in the DHT it might be worth peering with them just in case someone with an NFT only stores it with one of those service providers.

First off - thank you so much for taking the time to respond! I greatly appreciate it.

A couple things - I know that it is common for NFT work with Pinata and nft.storage. I am peering with both providers (as described in the document) but it doesn’t seem to help much. I still get stuck waiting 5+ minutes sometimes for IPFS to find the content. Once it’s found, the download is generally pretty quick.
But - here is where I’m getting stuck. Let’s assume that I can’t directly find out who is hosting the content (generally the only thing the projects release is the CID). I know the cloudflare gateway for example is able to find the content instantly because I can get the data from their HTTP gateway immediately after the CID is released. So how do they do it?

I was considering setting up my own cloud based IPFS cluster. Would that potentially give me quicker access to the data I’m looking for?

Has anyone solved your problem yet? will you pay me if I tell you a simple trick to make discovery faster :Build a tracker for ipfs, you still have the benefit of decentralization, because people doesn’t have to use it but if they do it will be must faster (in discovery)

DHT is central tracker

Still stuck on this… any help greatly appreciated. Thanks!

You haven’t really given anyone something they can replicate. If you post your config file ipfs config show (which will strip out things like private keys) and some CIDs you’re having trouble getting that’d probably make it easier for people to help you out.

Sure,

Here is my config:

Summary
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": true,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWSCeqhk4QwzT3EPMkq22Qm2eaMwJFiCwsvnZRi11Xx3Cc"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/ip6/2606:4700:60::6/tcp/4009",
          "/ip4/172.65.0.13/tcp/4009"
        ],
        "ID": "QmcfgsJsMtx6qJb74akCw1M24X1zFwgGo11h1cuhwQjtJP"
      },
      {
        "Addrs": [
          "/dnsaddr/fra1-1.hostnodes.pinata.cloud"
        ],
        "ID": "QmWaik1eJcGHq1ybTWe7sezRfqKNcDRNkeBaLnGwQJz1Cj"
      },
      {
        "Addrs": [
          "/dnsaddr/fra1-2.hostnodes.pinata.cloud"
        ],
        "ID": "QmNfpLrQQZr5Ns9FAJKpyzgnDL2GgC6xBug1yUZozKFgu4"
      },
      {
        "Addrs": [
          "/dnsaddr/fra1-3.hostnodes.pinata.cloud"
        ],
        "ID": "QmPo1ygpngghu5it8u4Mr3ym6SEU2Wp2wA66Z91Y1S1g29"
      },
      {
        "Addrs": [
          "/dnsaddr/nyc1-1.hostnodes.pinata.cloud"
        ],
        "ID": "QmRjLSisUCHVpFa5ELVvX3qVPfdxajxWJEHs9kN3EcxAW6"
      },
      {
        "Addrs": [
          "/dnsaddr/nyc1-2.hostnodes.pinata.cloud"
        ],
        "ID": "QmPySsdmbczdZYBpbi2oq2WMJ8ErbfxtkG8Mo192UHkfGP"
      },
      {
        "Addrs": [
          "/dnsaddr/nyc1-3.hostnodes.pinata.cloud"
        ],
        "ID": "QmSarArpxemsPESa6FNkmuu9iSE1QWqPX2R3Aw6f5jq4D5"
      },
      {
        "Addrs": [
          "/dns/cluster0.fsn.dwebops.pub"
        ],
        "ID": "QmUEMvxS2e7iDrereVYc5SWPauXPyNwxcy9BXZrC1QTcHE"
      },
      {
        "Addrs": [
          "/dns/cluster1.fsn.dwebops.pub"
        ],
        "ID": "QmNSYxZAiJHeLdkBg38roksAR9So7Y5eojks1yjEcUtZ7i"
      },
      {
        "Addrs": [
          "/dns/cluster2.fsn.dwebops.pub"
        ],
        "ID": "QmUd6zHcbkbcs7SMxwLs48qZVX3vpcM8errYS7xEczwRMA"
      },
      {
        "Addrs": [
          "/dns/cluster3.fsn.dwebops.pub"
        ],
        "ID": "QmbVWZQhCGrS7DhgLqWbgvdmKN7JueKCREVanfnVpgyq8x"
      },
      {
        "Addrs": [
          "/dns/cluster4.fsn.dwebops.pub"
        ],
        "ID": "QmdnXwLrC8p1ueiq2Qya8joNvk3TVVDAut7PrikmZwubtR"
      },
      {
        "Addrs": [
          "/dns4/nft-storage-am6.nft.dwebops.net/tcp/18402"
        ],
        "ID": "12D3KooWCRscMgHgEo3ojm8ovzheydpvTEqsDtq7Vby38cMHrYjt"
      },
      {
        "Addrs": [
          "/dns4/nft-storage-dc13.nft.dwebops.net/tcp/18402"
        ],
        "ID": "12D3KooWQtpvNvUYFzAo1cRYkydgk15JrMSHp6B6oujqgYSnvsVm"
      },
      {
        "Addrs": [
          "/dns4/nft-storage-sv15.nft.dwebops.net/tcp/18402"
        ],
        "ID": "12D3KooWQcgCwNCTYkyLXXQSZuL5ry1TzpM8PRe9dKddfsk1BxXZ"
      },
      {
        "Addrs": [
          "/ip4/139.178.69.155/tcp/4001"
        ],
        "ID": "12D3KooWR19qPPiZH4khepNjS3CLXiB7AbrbAD4ZcDjN1UjGUNE1"
      },
      {
        "Addrs": [
          "/ip4/139.178.68.91/tcp/4001"
        ],
        "ID": "12D3KooWEDMw7oRqQkdCJbyeqS5mUmWGwTp8JJ2tjCzTkHboF6wK"
      },
      {
        "Addrs": [
          "/ip4/147.75.33.191/tcp/4001"
        ],
        "ID": "12D3KooWPySxxWQjBgX9Jp6uAHQfVmdq8HG1gVvS1fRawHNSrmqW"
      },
      {
        "Addrs": [
          "/ip4/147.75.32.73/tcp/4001"
        ],
        "ID": "12D3KooWNuoVEfVLJvU3jWY2zLYjGUaathsecwT19jhByjnbQvkj"
      },
      {
        "Addrs": [
          "/ip4/145.40.89.195/tcp/4001"
        ],
        "ID": "12D3KooWSnniGsyAF663gvHdqhyfJMCjWJv54cGSzcPiEMAfanvU"
      },
      {
        "Addrs": [
          "/ip4/136.144.56.153/tcp/4001"
        ],
        "ID": "12D3KooWKytRAd2ujxhGzaLHKJuje8sVrHXvjGNvHXovpar5KaKQ"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Here are the last two CIDs I’ve pulled:

QmXAXu6Q6FJH9bGvnVt7Qs6KBh8538tgyv4gNubDj2ddKm
QmXQyUWciz8zLhtkfsFDHUyhzEezqaeA3Hw8wYVBPNviNa

But really, it seems to be every CID I pull takes at least 2-3 minutes to discover… once discovered the download takes anywhere from 30s-10m. My question is what are my options (if any) to guarantee a faster discovery/download time? It must be possible since I know of multiple services that consistently discover and download this content within seconds. Thanks again for the help!

In your config file, did you try to put the peer in question first inside "Peering": { "Peers": [] }? Or even in gateway

In this post you talk about a given peer: Find Content Faster - Willing to pay anyone who can solve this for me :) - #3 by clabowman

I don’t see it in your config file

EDIT: Never mind you did it

Did you try to remove all other peers ?

By remove all other peers… do you mean from the config file?

yes delete all other peers from the config file since, in your case, you only need one or try to add your peer to the bootstrap list

Tracker is just a DHT node, of course it can run without DHT

Tried shrinking my peer list and adding relevant peers to bootstrap, but not really any improvement.

I think my main question is this - what is the bottleneck for content discovery (assuming I don’t know who is hosting it)? Is it a physically constraint on my end like network throughput or compute power? Or is it the peers I’m connecting to? I’m just curious if deploying a beefy cloud IPFS cluster could theoretically overcome the slow discovery, or if it’s just beyond my control.

It’s similar to downloading a torrent.
CPU / ram / IO is rarely the bottleneck for sub GB/s transfers, your internet bandwidth is where it usually is. If you have a bunch peers seeding their content.
If there isn’t enough seeders providing contents you are likely to experience slow performance but at least you got good scalability and at some point it just might be a problem that solves itself