IPFS Daemon slows regular browsing to a crawl

I’m new to IPFS and I’d like to run a node on my home machine. However, whenever I have the daemon running it slows down my browser so much that I can’t justify running IPFS in the background.

I don’t understand exactly what is happening because there isn’t any major resource utilization happening. The following is a btop screenshot from when the daemon is running. The network isn’t being hammered by any measure, and there is hardly any CPU / disk usage.

If I kill the daemon, browsing becomes fast again, but the resource usage doesn’t seem to change much.

I also tried to directly measure pings, but they don’t seem to care if the daemon is on:

$ ping google.com
PING google.com (142.250.80.14) 56(84) bytes of data.
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=1 ttl=117 time=18.1 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=2 ttl=117 time=19.9 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=3 ttl=117 time=20.9 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=4 ttl=117 time=17.4 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=5 ttl=117 time=22.5 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=6 ttl=117 time=21.1 ms

or off

$ ping google.com
PING google.com (142.250.80.14) 56(84) bytes of data.
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=1 ttl=117 time=23.2 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=2 ttl=117 time=23.5 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=3 ttl=117 time=25.2 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=4 ttl=117 time=18.3 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=5 ttl=117 time=19.7 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=6 ttl=117 time=17.8 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=7 ttl=117 time=23.6 ms
64 bytes from lga34s33-in-f14.1e100.net (142.250.80.14): icmp_seq=8 ttl=117 time=23.9 ms

I also did a speed test with the daemon on:

220.4 Mbps download

11.6 Mbps upload

Latency: 12 ms

And with the daemon off:

225.8 Mbps download

11.6 Mbps upload

Latency: 12 ms

I don’t understand it. Navigating to a web page is extremely slow with the daemon on (and fast when if is off), but actual measures of bandwidth don’t seem impacted at all.

I’m hoping there is some way to modify the config to mitigate this problem. I’m currently using the “lowpower” profile.

Relevant bits of the config are as follows:


  "Datastore": {
    "StorageMax": "100GB",
    "StorageGCWatermark": 90,
    "GCPeriod": "1h",
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "HashOnRead": false,
    "BloomFilterSize": 0
  },
  "Mounts": {
    "IPFS": "/ipfs",
    "IPNS": "/ipns",
    "FuseAllowOther": false
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Routing": {
    "Type": "dhtclient"
  },
  "Ipns": {
    "RepublishPeriod": "",
    "RecordLifetime": "",
    "ResolveCacheSize": 128
  },
  "Gateway": {
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "RootRedirect": "",
    "Writable": false,
    "PathPrefixes": [],
    "APICommands": [],
    "NoFetch": false,
    "NoDNSLink": false,
    "PublicGateways": null
  },
  "API": {
    "HTTPHeaders": {}
  },
  "Swarm": {
    "AddrFilters": null,
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Network": {},
      "Security": {},
      "Multiplexers": {}
    },
    "ConnMgr": {
      "Type": "basic",
      "LowWater": 20,
      "HighWater": 40,
      "GracePeriod": "1m0s"
    }
  },
  "AutoNAT": {
    "ServiceMode": "disabled"
  },
  "Pubsub": {
    "Router": "",
    "DisableSigning": false
  },
  "Peering": {
    "Peers": null
  },
  "DNS": {
    "Resolvers": {}
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Provider": {
    "Strategy": ""
  },
  "Reprovider": {
    "Interval": "0",
    "Strategy": "all"
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "UrlstoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "AcceleratedDHTClient": false
  },
  "Plugins": {
    "Plugins": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Internal": {}

Currently using

go-ipfs version: 0.12.0-rc1
Repo version: 12
System version: amd64/linux
Golang version: go1.16.12

I’m wondering if this might be an issue with my router configuration? I don’t know what else could be going on. Any ideas?

1 Like

Get RPI 4 for running IPFS. Low energy consumption and you can simply forget about it.

You can lower ipfs cpu priority, but it does not help much with interactive response. My IPFS nodes are using about 10% CPU on big Intel - its not much. Scheduling is simply broken in most OS - Windows 10 and Linux for sure.

Yeah you should.

My IPFS node use 4% on a ryzen 3600 with >1000 peers.

You should check you aren’t running cooperative scheduling. Premptive is just a must.

I don’t think you should.
ARM doesn’t have memory safety and require syncing the cache back to ram when doing anything lock related.
Golang and go-ipfs even more use a LOT of locks, this mean if you run go-ipfs on arm based CPUs you are very likely to run out of memory bandwitdh.

You don’t need any ground breaking performance from RPI 4 IPFS. All it in does is feeding cloudflare and dweb.link caches every 8 hours.

Offloading everything to a Pi4 is a workaround (and I happen to have a Pi4, so I’ll try it), but it’s not a solution to the problem. I’d rather focus on why this behavior exists in the first place. However, it is worth noting that the issue is localized to just the one computer running the daemon on my network. I have a second machine and the browser is working at full speed. It’s just the machine that the daemon is running on that is slow.

I don’t know why you say scheduling is broken. I know its an NP-hard problem, so any solution that currently exists will be a heuristic with corner cases. I’m running stock Ubuntu 21.04. Is there a way to check what type of scheduling it uses? A quick google says that it just uses “Completely Fair Scheduling” The Linux Scheduler: a Decade of Wasted Cores | the morning paper
but I don’t know if there is a way to change that to test premptive-vs-cooperative.

To test if process priority had any effect, I turned the “niceness” of the ipfs daemon in all the way up in htop (which should lower the priority). This didn’t seem to change anything (really sucks that I don’t have a quantitative test for this yet, I should try to find one). I didn’t expect it to because the CPU use was only 13%, so it’s not like chrome is struggling to find CPU time.

To put the nail in the CPU hypothesis coffin, I ran a benchmark with an without the daemon running. Specifically I ran: sysbench cpu run --threads=4 --cpu-max-prime=40000 (requires sudo apt install sysbench).

With the daemon on:

sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Prime numbers limit: 40000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  2414.73

General statistics:
    total time:                          10.0003s
    total number of events:              24151

Latency (ms):
         min:                                    1.64
         avg:                                    1.66
         max:                                    3.33
         95th percentile:                        1.67
         sum:                                39996.94

Threads fairness:
    events (avg/stddev):           6037.7500/5.45
    execution time (avg/stddev):   9.9992/0.00

With the daemon off:

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Prime numbers limit: 40000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  2414.38

General statistics:
    total time:                          10.0010s
    total number of events:              24149

Latency (ms):
         min:                                    1.65
         avg:                                    1.66
         max:                                    6.33
         95th percentile:                        1.67
         sum:                                39998.31

Threads fairness:
    events (avg/stddev):           6037.2500/4.32
    execution time (avg/stddev):   9.9996/0.00

Basically no difference. I’m just totally baffled by this.

What resource could opening chrome tabs (only on the same machine; recall chrome on other machines on the same network work just fine) possibly need that the IPFS daemon is throttling access to?

Another test. I used curl to download a 100MB file with and without the daemon. I don’t see a noticeable difference in download speed.

(pyenv3.8.6) joncrall@toothbrush:~$ curl http://ipv4.download.thinkbroadband.com/100MB.zip -o without.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  100M  100  100M    0     0  20.8M      0  0:00:04  0:00:04 --:--:-- 21.4M
(pyenv3.8.6) joncrall@toothbrush:~$ curl http://ipv4.download.thinkbroadband.com/100MB.zip -o with-daemon.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  100M  100  100M    0     0  19.7M      0  0:00:05  0:00:05 --:--:-- 22.9M

I ran the test in the opposite order with a 200MB file as well, same result:

(pyenv3.8.6) joncrall@toothbrush:~$ curl http://ipv4.download.thinkbroadband.com/200MB.zip -o with-daemon.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  200M  100  200M    0     0  23.4M      0  0:00:08  0:00:08 --:--:-- 27.9M
(pyenv3.8.6) joncrall@toothbrush:~$ 
(pyenv3.8.6) joncrall@toothbrush:~$ curl http://ipv4.download.thinkbroadband.com/200MB.zip -o without.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  200M  100  200M    0     0  15.3M      0  0:00:13  0:00:13 --:--:-- 27.9M

I also tried to write a python program that might make a MWE, but no dice, even using the requests module does not demo the issue correctly.

Code is:

def main():
    import requests
    import ubelt as ub

    urls = [
        'https://google.com',
        'https://whitehouse.gov',
        'https://cmake.org',
        'https://stackoverflow.com/',
        'https://reddit.com',
        'https://github.com/ipfs/go-ipfs',
    ]
    times = {}
    for url in urls:
        with ub.Timer() as timer:
            resp = requests.get(url)
            print(resp.text)
        times[url] = timer.elapsed
    print('times = {}'.format(ub.repr2(times, nl=1)))
    total = sum(times.values())
    print('total = {!r}'.format(total))


if __name__ == '__main__':
    """
    CommandLine:
        python ~/misc/test_https_speed.py
    """
    main()

Output with daemon:

times = {
    'https://cmake.org': 0.3563604890368879,
    'https://github.com/ipfs/go-ipfs': 0.5216539880493656,
    'https://google.com': 0.3111111380858347,
    'https://reddit.com': 1.4294513480272144,
    'https://stackoverflow.com/': 0.26056272198911756,
    'https://whitehouse.gov': 0.3523064670152962,
}
total = 3.2314461522037163

Output without daemon:

times = {
    'https://cmake.org': 0.33759950601961464,
    'https://github.com/ipfs/go-ipfs': 0.2698422559769824,
    'https://google.com': 0.27440492797177285,
    'https://reddit.com': 2.1193725920747966,
    'https://stackoverflow.com/': 0.25252791692037135,
    'https://whitehouse.gov': 0.3117579030804336,
}
total = 3.5655051020439714

I’m going to work on a benchmarking script that uses selenium to drive the chrome browser itself and measure times quantitatively that way. If the issue isn’t measurable / reproducible I can’t expect much help. Things get tricky with caching, because when a web page is cached it does load fairly quickly even if IPFS is running. So, I’ll need to account for that in the benchmark.

are you using any proxies?
let me know, because it might be fixed because sometimes when you are using proxy software they can’t process many connections (regular browsing doesn’t have the pattern p2p networking programs do)

I don’t think so. My setup is fairly basic. I have a NetGear R7350 router:

  • Router WAN port is connected to the modem
  • Router LAN ports are connected to several devices including my main PC running IPFS.
  • I’m using Cat7 Ethernet (10 Gbps) which should be more than enough (router has 1Gbps ports, and PC has 2.5Gbps ports).

The router configuration is nearly stock. All I’ve done is:

  • setup a WIFI name and password,
  • assigned static IPs to each of my devices, and
  • forwarded port 4001 to my main machine to expose the IPFS node.

port forwarding… is the ipfs node running on your pc being used from other places?
if you don’t do port forwarding and the experience gets better you can then blame the router or isp

Hello,

does every device connected to your router slow down? Judging by the price, it is not a very high-end router.

In your configuration, you skipped the Addresses section, which is relevant to know which transports you are enabling. I’d recommend disabling QUIC and websockets and leaving only TCP, and alternatively switching to leaving only QUIC.

Usually the problem is how routers handle connections. You have a grace period of 1m. You can open a 1000 connections in one minute, even if your Highwater setting is 40. Some routers are very bad at dealing with that.

No, every other device on the router is fine. Just the one running the daemon is slow.

Not sure exactly what part of the config is sensitive and what is not. It looked like there was a private key in there, so I wanted to make sure I didn’t share that. The address section is:

  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },

I can try removing the quic parts. I also have to learn more about what “highwater” is. I’ll report back when I try this.

Disable QUIC in the Transports section of the config…

    "Transports": {
      "Multiplexers": {},
      "Network": {
        "QUIC": false
      },
      "Security": {}
    }

I’m not sure your network tests are very reliable. The IPFS daemon might be doing different things are different moments (i.e. sending something to someone won’t happen all the time). When your browsing gets slow you should probably write down the info from ipfs stats bw and try to correlate.

But all in all, it probably comes down to sharing bandwidth between different applications.

You’re right, they aren’t reliable. I’ve been unable to measure anything that I can automate and repeat to demonstrate this issue. I clearly see the effects of the issue in my everyday workflow - when the browser is chugging, if I kill the daemon, the page loads almost instantly after I close it. This is manually repeatable, but I’d like to be more scientific about it.

Also note, that this behavior is happening when I don’t have any data of relevance pinned on the node. So, there wouldn’t be anyone downloading data from me. The only bandwidth usage would be chatter on the network (which I don’t entirely have a grasp of what type of chatter there is).

I didn’t know about ipfs stats bw, so that will be helpful. Maybe I can write a script that simulates a browsing pattern by doing a random walk through wikipedia. If I can also measure the browser bandwidth stats, then I ought to be able to measure the correlation quantitatively.

I tried that, (I just ran ipfs config edit, and saved the file, and restarted the daemon, is there any other step I need to ensure ipfs uses the new config?). But unfortunately it didn’t work.

After the daemon was running, I went to wikipedia, reddit, and github in 3 tabs. All of them were chugging and hanging on a blank loading page. The moment I killed the daemon, all 3 finished loading almost immediately. The thing that bothers me about this test is I’m never sure if I’m just getting lucky at the moment I kill the daemon. But in my experience loading these 3 sites in 3 tabs happens fairly quickly.

Do others running nodes not experience this problem? I was half expecting that people would just say: oh yeah I deal with that too. But from what I can gather, either most people don’t run IPFS nodes on their main workstation or they don’t notice any major negative impact? Is that true?

Was it hanging on load or on DNS resolution? It still sounds like your router is causing the problems (perhaps only to you).

On the workstation no, on the router, fully depends on model/make etc.

Not having a good measurement of this is frustrating. But I think I’ve made a router change that resolves it. But I’m not noticing the problem, and before it was very noticeable, but without measurements I can’t help but wonder if its all in my head.

What I think might have fixed it is enabling IPv6 on my router. In the IPv6 settings page on my router I changed the “Internet Connection Type” from “Disabled” to “DHCP”. I have the daemon running in the background now, and I’m not noticing any sluggishness. Although, after I changed the router setting and rebooted my machine I did note what seemed to be a slowdown (again no measurements, could all just be me being crazy) that eventually went away.

Just to be clear, you mean that it is common for people do run IPFS on their main workstation, but they just don’t notice a slowdown?

Also, are you implying people run IPFS on the router itself?

No, I mean that the slowness happens because of the router, not because of anything happening on the machine itself. For example, if the router cannot handle a bunch of UDP connections related to IPFS, things like DNS might stop working because it also uses udp and then pages hang.

I had an experience where I was downloading torrents and it got so slow for me whenever I used the maximum profile setting.
I was also using a proxy software and turns out the software was bad at handling these connections so I switched to another proxy software and it got better but I sitll had to tune down the profile settings