Trying to better understand the pinning concept!

Hi There!

I’ve been writing an IPFS for beginners article, and I realized I conceptually was missing something in my understanding, and perhaps this thread could help others.

To upload content permanently, I do:

  1. ipfs daemon
  2. ipfs add ./my-file.jpg
  3. curl gateway.ipfs.io/ipfs/${my-file.jpg hash}

At this point, the data has been “spread” to public nodes, as it was requested through a public gateway node.

However! That data will be “garbage collected” from the public gateway nodes eventually, making the data non-permanent.

I understand that to avoid garbage collection, one should “pin” that data.

  1. ipfs pin add ipfs-path

My question is:

If I use ipfs pin add, will all nodes that come in contact with that content also pin the content? Or is the content pinned only on the node that the CLI tool is communicating with?

If so - does that mean “killing my local daemon” will mean that the content becomes “unpinned”?

4 Likes

The content isn’t ever permanently uploaded unless someone hosts it permanently. Under the surface, IPFS is closer to BitTorrent than Freenet/GNUnet. You never host data for anyone else, only yourself. So your content isn’t permanently uploaded just by asking another node to download it.

If you pin a piece of content, you’re only telling your own node to not garbage collect it. Other nodes can do as they please, they won’t even know you’re pinning it.

Other nodes can download it for as long as your node is online. If you shut down your computer, they can’t download it from you, but they might still be able to get it from other nodes. When you turn on your computer again, it’ll be available from your node again.

Also note that if a piece of content is highly desired, it doubly increases the availability. Firstly, it means the content gets more seeders, and secondly, it means the content won’t get garbage collected, because you’ll never throw away active content if there’s more stale content to throw away instead.

Also, your content is automatically pinned when you add it. You only need to do pin add to pin content someone else uploaded. So step number 3 will not do very much other than put unnecessary stress on the gateway, since they don’t tend to have much bigger caches than any other node.

2 Likes

Thanks @es_00788224 ! This is pretty much what I expected to be true.

So really what you’re saying is: to use IPFS as a host, you pretty much “have to host a permanent gateway to play”.

Are there good examples of how one would setup a semi-private remote gateway for hosting their own content?

I understand I’d run the daemon on a VPS, and probably use NGINX to proxy requests to it, but that’s concerning for security reasons. I’d love to know exactly how to safely deploy a remote daemon that only I can add content to.

The answer is to just run a normal IPFS node on a VPS, not a gateway. Basically, gateways are just caching proxies.

To upload content, you’d:

  1. Add it to your local machine.
  2. Run ipfs pin add /ipfs/path_from_the_previous_command on the VPS node and wait for it to fetch the content from your local machine.

Now, if someone asks a gateway (either their own gateway or gateway.ipfs.io) for your content, the gateway will either serve it up from a local cache (if it already has it) or ask the IPFS network for it. Given that you’ve pinned your content in an always-online node, the gateway should always be able to fetch the content from somewhere.

I was thinking about this too, to put a node on a remote server, e.g. on Uberspace, DigitalOcean etc., just to see how it behaves operating 24/7. So I guess some form of write-up for users with less experience would be great, once you’ve managed to setup your node.

And if I understand it correctly, you can use ipfs-cluster to synchronize pins on multiple nodes, so you wouldn’t even have to log onto the remote server. That wouldn’t be the right choice, though, if you want to keep the local & remote contents separate.

I think your question about running a gateway connects to the discussion here:

The easiest solution is probably using existing software. Generate a new SSH key without a passphrase for the account on the server, then restrict the key to only executing a certain command. Write a script that takes two arguments, pin|unpin, IPFS_HASH, connects to your server with the key, and runs a local shell script with those same arguments. The local shell script of course checks validity before executing (i.e. first argument is pin or unpin, second argument is 46 characters long, begins with “Qm”, only contains A-Za-z0-9)
If you don’t mind typing in your password, you can simply run ssh username@hostname "ipfs pin add IPFS_HASH".

Thanks a lot everyone - running ssh u@h ipfs pin add ${hash} is a good solution.

That said - part of the reason why I’m researching this is for writing deploy plugins with Webpack or Brocolli JS.

Right now, the IPFS JS wrapper doesn’t support pinning - but even when it does, it’s unlikely to handle connecting to remote/non-local nodes.

It would be trivial to make an (express / sinatra etc) app that provides a simple HTTP pinning API via a private API key, and run it on your remote VPS (with the daemon).

That way you can add locally & trigger a pin remotely via an API call. That said, this feels like a really common need for folks using IPFS, and I’d be suprised if there wasn’t something like this in the ecosystem already.

I’ve heard there’s an official HTTP API for connecting to nodes.

Does anyone know of such a way to pin remotely (and thus permanently) in a programatic sense?

What about doing something like ipfs pin add $(ipfs name resolve $HARDCODED_IPNS) in a loop? Then it’ll pin everything you publish at that name. You might want to do something to unpin the old versions though, for example using ipfs pin update.

That’s a cool pattern - but doesn’t really solve my case of wanting to add something on a local daemon, and remotely pin it “programmatically”. The above will only pin locally, so when that node is down (like shutting down my computer) the pin will disappear.

What I’m currently is thinking I’ll make a simple JSON API designed to run on a VPS, and then privately make calls to an IPFS daemon also running on that same VPS.

That way I can pin stuff remotely via the API. Does anyone know if that exists already? Seems like a common use case.

The remote pin scenario could be solve with the pubsub of “pin hash” message signed with your private key. The VPS node could be configure to listen and act only on your signed pubsub command. What do you think ?

1 Like

@jplaurin sure - but are hashes (that you’re going to pin) actually something that require encrypting? They’re kind public, right? Pubsub also feels like overkill for what’s could be a simple request-response cycle:

I’m thinking it’s a simple long-running service that only exposes the pin-related endpoints of the official HTTP API, and simply handles Authentication (via an API Key Header - this will be fine over SSL):
https://ipfs.io/docs/api/

Ie:

  1. curl "http://myremoteIPFS.node/api/v0/pin/add?arg=<ipfs-path>&recursive=true&progress=<value>"

  2. Internal service validates the IPFS-API-Key header

3a. If Auth Success, service passes request off to the IPFS node's officially supported HTTP API, and returns the response

3b. If Auth Fail, request returns 403

PS - this could all likely be configured in NGINX, too I think.

This service could also be in charge of health checking the underlying Daemon too, and booting it when the app restarts.

Really, the big question of this thread is: Why doesn’t the IPFS HTTP API have auth built in?

Asked the API Auth Question over here:

I was suggesting pubsub simply to avoid a centralised solution. With HTTP API, you rely on an location addresse service. You always, need to know where to reach the service. With pubsub, the service could move anywhere and still process the request. You could have many node located all around the planet and still do the service with no need on a centralised dns service.

Yes, it does. You run that command on the VPS. To update it periodically, use a shell script or cron job. If you want multiple hashes you can use the ipfs object command and publish the root hash to IPNS, ipfs pin is recursive.
There is absolutely no need for a HTTP API here.

Oh got it - cool! Makes a lot of sense.

Ok cool, I’d like to avoid setting up an HTTP API.

Ok - so I’m probably missing something here. Would you be willing to explain via example?

I’m writing a deploy script (in Node.js) for a JS browser app designed to be run locally, to deploy the assets to IPFS and pin them somewhere other than a local daemon.

I do:

  1. Build the JS & CSS etc
  2. Spin up the local daemon via (let ipfs = new IPFS())
  3. Add each built file to the local daemon (via ipfs.files.add)
    4?. Programmatically dip into shell (within the script from Node.js exec / spawn) and run the pinning commands (and potentially the IPFS commands) on the VPS over SSH (assuming the localhost has an authenticated SSH key for the VPS that ideally doesn’t require a passphrase)?

Step 4 is the confusing one to me. I’m not sure how a cron job would help here? Is step 4 correct or should it be done in some other manner as a part of the deploy step?

@hhff you could use ipfs-cluster for this. Set up a cluster where your main machine is the leader and any number of storage servers are followers. Whenever you want your storage servers to pin something, just add the desired hashes to the cluster’s pin set.

cc @hector

If you want to have redundancy, ipfs-cluster can help you. Otherwise work directly with IPFS http api and a single node. You can always drop-in cluster in the future, as it proxies the IPFS API so services on top can work like before.

Step 4: update your IPNS hash to point to the latest version of the folder with your static content
Step 5: resolve the IPNS hash and pin the latest version of the folder with your static content on your server

You don’t have to do step 5 as a part of the deploy script. Put a cron job on the server to do step 5 every 10 minutes. The IPNS hash is still updated with your computer, so this doesn’t introduce any new lag.