Public cluster with grow-only pinset, where all peers are trusted + additional validation of the crdt updates?

Hi!

More than a year ago I worked on a distributed app for archiving and sharing research data between dynamically changing number of peers. The project used IPFS for file sharing and communication (via pubsub), and utilized CRDT-based distributed index for storing a set of CIDs corresponding to the items in a particular archive (which is, I guess, an equivalent of a particular cluster in IPFS-cluster terms). My version of the CRDT-based index was very similar to what IPFS-cluster uses now for dynamic consensus, with few notable differences:

  • In IPFS-cluster, as far as I can understand, CRDTs are used to hold a distributed event log, with each entry corresponding to a pin/unpin action or changing the replication factor of a particular item. My index acted like a grow-only set of CIDs for users to replicate (depending on the settings of a particular archive, each file from an index is either downloaded and pinned by everyone automatically, or downloaded and pinned manually at will).
  • In IPFS-cluster, only trusted peers can modify the set of pinned items (by adding new entries to a distributed event log I presume). In my case, everyone could connect to a particular archive and add new CIDs to the index, thus sharing their files with others. As a basic spam prevention measure, peer must solve a proof-of-work for each update they push to the index, otherwise the update won’t be accepted by other peers.
  • In my case, the index itself is grow-only, there’s no way to remove a particular CID from the index. While the CRDTs used in IPFS-cluster are also grow-only, the pinset is not (again, if I understood correctly). Any particular pin can be removed from the pinset by a trusted peer.

Otherwise, both CRDT-based consensus and my distributed index are conceptually very similar, and my prototype utilized very similar update and replication methods that are now used in IPFS-cluster. Therefore, when I’ve decided to move back and continue my work on this project, and learned that IPFS-cluster can handle dynamic consensus now, I started to wonder whether I could use IPFS-cluster as a new backend, replacing my own index implementation. The main questions are, obviously, if it’s possible to configure cluster so that all the peers are trusted to add new pins to the pinset (and no one have the permission to remove the pins), and whether it’s possible to reject the CRDT updates based on certain condition (e.g. no PoW result was supplied in the metadata section of the pin being added to the pinset). What would you suggest?

Thanks!

Hi @v696973!

Yes. Actually it uses a grow-only set as well, but this is backed by a “log”, which is actually a Merkle-DAG, which is discovered and distributed over IPFS. The only difference with your system is this last part.

You can set trusted_peers to ["*"] and everyone is trusted.

There’s an unpin_disable option (Configuration - Pinset orchestration for IPFS) but it is more of a fail-safe against removing anything from ipfs even when the item is unpinned on the cluster. The option is hidden from the default config as it was a quick fix until we managed to put time and do things properly.

Currently, cluster processes or drops messages based on a pubsub topic validator. For example, if a message is not among the “trusted peers”, then it is dropped and has no effect. (https://github.com/ipfs/ipfs-cluster/blob/master/consensus/crdt/consensus.go#L160). Adding a PoW check as additional validation would work.


In short, two things here:

  • Supporting add-only clusters properly should be possible and it is not overly complex (will need support from go-ds-crdt to ignore any set-Remove operations). This is something I’m happy to put on the short-term roadmap.
  • Adding a PoW validator to the code is super easy, but seems a very specific usecase. If you would like this to be supported upstream, the best way I can think would be to have a crdtpow consensus component as a thin-wrapper to the existing one, adding a custom pubsub validator.

Overall, it seems easy to bring your usecase to reality. I would appreciate if you can open issues in the cluster repository to discuss the technical details. Would you be willing to work on any of this?