Measurement-based research Paper: "Mapping the Interplantery Filesystem"

Hey IPFS-People,

in the last months, we’ve performed some measurements of IPFS’ network layer, the results of which can be found in our paper.
Hopefully, the results are of interest and shed some light on previously unknown things, so that IPFS can safely grow and improve! :slight_smile:

In a nutstell:

  • We wrote a crawler that crawls the Kademlia-DHT by sending FindNode-packets for each bucket of a node. The crawler is optimized for speed, so that the snapshots are as accurate as possible and affected by churn as little as possible.
  • This allows us to enumerate the nodes in the network and look the neighborhood graph. Results: A lot of nodes are behind (symmetric) NATs and only return their local IP addresses to the DHT, which is not really helpful for anybody…
    Also, connection durations are very short.
  • Furthermore we ran some monitoring nodes with no connection limits to validate the crawl results and also get an idea of how many nodes are clients and how many actively support the DHT-protocol (i.e., are not clients), which is roughly 70%-80%.
  • We looked at the code and the behavior of IPFS nodes with default settings and argue that the parallel lookup of content through the DHT and querying our neighbors through bitswap is very robust against attacks but not ideal for performance, especially if we have so many neighbors.

We’ve some idea for future works and maybe the crawler can be integrated in a bigger “health of the IPFS network” software which monitors what’s going on in the network.

Any comments or (critical) feedback would be highly appreciated! :slight_smile: