Do I need to manually connect to every member in a network?

I’ve been super excited about IPFS, and I finally started experimenting with it across multiple machines. But I’ve been running into a bunch of problems getting nodes to talk to each other with pubsub or even accessing content that was added to the network.

I thought I was doing something wrong with configuration, but then I ran into an older thread talking about Relays.

Relays are super cool to be sure, but they seem to be aimed at adding an unexposed server to my swarm, not blindly forwarding content or acting like a general access point. So say someone wants to get a piece of content and I’m the only person hosting it either because it’s just a tiny static blog/RSS feed, or because I’ve only just published it, or for whatever reason. Do they need to explicitly add my IP to their swarm in order to get that content?

This seems problematic, especially for things like pubsub. If someone publishes a message, but it only appears to people who are directly connected to that node, then how does anyone actually in practice build a serverless application? If I have a distributed application that every user is running locally, will every individual user who runs it need to know the IP addresses of every other user?

I feel like there must be something I still don’t understand about how swarms work, or that I’m not considering some other feature that makes more distributed access possible.

1 Like

Do they need to explicitly add my IP to their swarm in order to get that content?

No. That’s what we use the DHT for. Basically, they’ll go to the DHT and ask who has the content. Assuming you haven’t disabled this feature (the provider system), your node will publish records to the DHT stating that you have the content.

This seems problematic, especially for things like pubsub.

Pubsub will actually forward messages. That is, when a node receives a message on a topic to which they are subscribed, they’ll forward it to all subscribers to which they are connected. That means you just need one connection into the network of pubsub nodes listening on a topic to receive messages on that topic. Note: If you don’t have any such connections, we currently don’t make any effort to establish them. However, we’re working on that (we’ll use the provider system as we do with content).

1 Like

By the way, excellent questions.


Relays

Relays serve two purposes:

  1. Dealing with NATs. Without relays, two nodes behind NATs/firewalls can’t talk to each other.
  2. Keeping fewer connections open. There’s usually less overhead in keeping indirect connections open then there is keeping direct connections open. This will allow us to maintain connections to even more peers. Note: this will actually improve your node’s ability to find content as it will effectively increase the number of peers to which your node is connected (without increasing the number of open sockets).
1 Like

So that’s dht provide I’m assuming? Just tested from my desktop to my Raspberry Pi and I was able to add content and then access it over a terminal even though I’m not exposing any ports on my router and even though I’m not directly connecting them as peers. That’s pretty cool :smile:

I guess this also answers some of the more theoretical questions I had about overhead and latency, because not every single node needs to be a DHT. You just need to have enough of them to coordinate the swarm, and individual nodes can have more control over when explicitly they publish.

Huh. So I do still need some minor coordination to make sure nodes are connected to at least one other node in the network (at least until providers are up), but finding just one peer to connect to is significantly easier than having to connect to everyone. And like above, I guess the other advantage is that nodes don’t have to waste a bunch of time publishing or keeping track of messages they don’t care about?

Oh crud, I didn’t think about that! I got the whole “how do we have a peer behind a firewall” thing, but I was thinking of relays as just being a fallback mechanism, like you’d always prefer to directly connect if possible. But if I’m understanding this correctly, there’s good reason why you might want to make yourself available on a relay even if you’re publicly accessible, just so you can consolidate a bunch of peers and connect to all of them at once.

And I suppose there’s nothing to prevent you from connecting to multiple relays or also doing direct connections for anyone who really cares about the extra latency?

Thanks a ton, this makes a lot more sense. I’ve been watching a lot of talks and going through some of the discussions on the Github repos, but I really need to just sit down and spend a weekend or two doing a bunch of projects so I can make sure that beyond the high-level concepts I have all of the actual details straight.

So that’s dht provide I’m assuming

Yes.

Just tested from my desktop to my Raspberry Pi and I was able to add content and then access it over a terminal even though I’m not exposing any ports on my router and even though I’m not directly connecting them as peers. That’s pretty cool

You still won’t (generally) be able to transfer content without establishing some sort of connection (you just won’t have to manually connect). In this case, you may have used the providers system to find your Raspberry Pi or you may have used MDNS (local network discovery) to connect to all nodes on your local network (if they’re on the same network).

not every single node needs to be a DHT

Yep. You can actually stop acting as a DHT server by passing the --routing=dhtclient flag to your router. Eventually, we’ll probably make this the default for non-server profiles (saves battery and will actually make the network faster as we’ll have fewer flaky DHT nodes).

don’t have to waste a bunch of time publishing or keeping track of messages they don’t care about

Yes. Eventually, we’d like to add a way to relay a topic without subscribing to it (e.g., by setting a config flag) but that’s not a priority at the moment.

But if I’m understanding this correctly, there’s good reason why you might want to make yourself available on a relay even if you’re publicly accessible, just so you can consolidate a bunch of peers and connect to all of them at once.

Exactly.

And I suppose there’s nothing to prevent you from connecting to multiple relays or also doing direct connections for anyone who really cares about the extra latency?

Yep. At first, we’ll probably just have nodes advertise the set of relays they support and the dialer will pick one. Eventually, we’d like connection migration/downgrading (i.e., we have a direct connection and then we negotiate which relay to use).

I think this answers all of the immediate questions I had. Thanks a ton, I feel like I’ve got a much better grasp on what’s actually happening on the network level now.