Pinning strategies

I am exploring IPFS for uses in sharing/publishing scientific data at the DAG level (i.e. linked data structures, not files). I wonder if there is any add-on software layer to help with the management of pinning in such a context.

Imagine I have a few applications that process the same kind of data. They can run on a server, as desktop, or a mobile device. To access IPFS, each application uses an IPFS node running on that device (which I think is the only way to use IPFS at the moment).

From the applications’ point of view, the two main types of actions are “get published data for local use” and “publish my own data”. For an application running on a permanently running server, the first requires no specific action, and the second is just pinning. Fine. But when the application runs on a desktop or even mobile device, the first action could require “pin locally to make the data available offline”, whereas the second means “pin on a server/pinning service and stay online long enough to ensure that the server has retrieved the data”.

I wouldn’t even know how to implement the last part, but my main question is how I would go about implementing such management strategies separately from the application logic. IPFS by itself only supports pinning to the local node, unless I have missed something. But perhaps there is some layer on top of IPFS that helps with the rest?

Maybe you can look at ipfs-cluster, to coordinate your user nodes with a stable node?
Not sure how well it suits your use case, I’ve never used it.

I forgot to mention that I had already looked at IPFS Cluster. Unless I missed something, it is intended for clusters of always-up nodes and handles data and load distribution. A node that drops out of the cluster has to be put back in manually (if I understood correctly), which makes it in appropriate for mobile devices.

I think you’re asking how to keep the pins of one ipfs node in sync with another ipfs node. ipfs does not have this functionality. The following message references github issues in cluster that are relevant and worth keeping an eye on:

I know you said that you already looked at cluster, but I thought it worth mentioning that this may be the most relevant out-of-the-box behavior you’re going to find:

https://cluster.ipfs.io/documentation/internals/

Thanks for those pointers, which helped me clarify which problem I am trying to solve. It’s not syncing pins between nodes, it’s configuration of pinning services at the device level.

From the application’s point of view, what I am looking for is a service/API with the following functionality:

  1. Take this CID and send it to an appropriate pinning service.
  2. Tell me when all my pinning requests have been handled so that I can safely go off-line.

On a permanently connected node, local pinning could well be the default (or only) pinning service, but not on a mobile device.

The main job of the API layer I am looking for would be to know the pinning services preferred by the user and the rules for picking one of them (e.g. by the size of the data, or its desired permanence). For a starter, I’d be happy with being able to configure just a single pinning service.

@khinsen Apologies! But, ipfs itself does not do this. If ipfs-cluster does not do what you want, then you’ll have to write some code yourself or rely on a third party (I don’t know of a third party offering this, but someone else may). If this is something you think either ipfs or ipfs-cluster should do, consider adding issues in the appropriate repositories:

or

Is there a less-than-ideal solution along the lines of periodically running ipfs dht findprovs QmHash until you find a provider with a different peer id from the node you want to shut down?

Thanks for your comments! I started looking seriously into IPFS only two weeks ago, so I don’t dare judge what IPFS or IPFS-cluster should do. I am just wondering how to solve a problem I encountered in my first experiments, working on a laptop that is often off-line.

If there’s anything that I think should be done at the IPFS level, it’s a discussion of mobile devices in the documentation.

I found this blog post from the Textile people that seems relevant. Apparently they work a lot with mobile devices, and their “cafés” look like a solution to my problem. I’ll have to check that out in more detail.

hey, glad to be able to tell you that this is no longer the case. We are finalizing a release but current master and latest release candidate (0.11.0-rc2) allow to launch cluster peers with ipfs-cluser-service daemon --consensus crdt and this removes all the constraints with the “fixed-always-online” peers that we had. Cluster peers can come and go as they want without any side-effects in the system.

Cluster also offers an REST API that you can use to “add”. i.e. instead of ipfs add you could use ipfs-cluster-ctl add to add content to the local and the remote node at the same time. You can also pin something with ipfs-cluster-ctl pin add --wait and wait until things have been pinned everywhere where they should be pinned. We can help you setup an embedded cluster peer if you want to use this as a library from inside a different application.

The main issue until we have a final 0.11.0 is that we haven’t written the docs (we are actually revamping all the docs) and a couple of rough edges around UI and bugs that we are fixing as we find them, but I can try to help you get going in the meantime.

1 Like

Thanks for the good news! This opens up new possibilities, such as using ipfs-cluster for syncing data between several devices. A bit like Syncthing for public data (or encrypted private data). Or crowd-sourcing data publication, using a large number of unreliable nodes to create a cluster that is much more reliable as a whole.

Right now I am working on different things, so I can wait for the official release to try this out, but thanks nevertheless for the offer to help with potential bugs!