Ipfs and distributed peer review

Hi All,

I wanted to share a use case I’ve been using experimenting with. The core idea is that I want to separate peer review from publication.

Anyone should be able to publish their data anywhere, and that data should be able to be used anywhere.

But since we’ve de-coupled quality control from publication, how can we learn about the quality of that data or whether or not it has been reviewed since it no longer has a single point of access (aka a specific book or siloed website) or identifiable origin.

Enter ipfs. Instead of thinking about the data’s origin, we can think about its content.

To demonstrate, I’ve built a little review registry. http://dll-review-registry.scta.info/

It was originally built for the Digital Latin Library and intended to be a review registry for reviews of latin editions, but there’s no reason it can’t support reviews of any file or any kind of data. If there is a url for the data, a review can be created.

Anyone can log in with their github credential to leave a review about any piece of data. They simply have to provide a link to the data and the text of their review.

When they hit submit, the system will retrieve the content of the provided url. It then uses ipfs to pin that content to that node, making it available on the ipfs network. (note, I’m having trouble getting port 4001 exposed at the present, so pinned data is for the moment only available at the scta gateway (http://gateway.scta.info). But it is enough to give you the idea.)

Once a review is made, the registry becomes a discovery endpoint that any application can use to discover whether or not the content has been reviewed. See my early and primitive API docs: Swagger UI

Any application can send the endpoint the link to a file (or a pre-computed 256 or ipfs hash) and the registry will return any reviews for that exact content/hash.

The beauty is, in typical, IPFS fashion, you don’t have to know the “location” of the reviewed data. You can just send the service the data you have, the system will compute the hash and check to see if there any reviews for the identical content.

Here are two screen shots of independent applications using the review registry and reviewed data pinned to the ipfs network. In each case, you can see a little “green” review badge that has been separately retrieved from the service.

You can see the request in action here: http://scta-staging.lombardpress.org/text/lectio1 or see the screen shot below.

This second screen shot includes the ipfs hash as an indicator of the precise data that was reviewed. (See the bottom right corner)

I’d be interested to hear from others who are interested in similar questions: How could we use ipfs to create a global review registry of distributed content? How does this fit with related work you’re already doing? How can we collaborate further?

7 Likes

In the abstract, what you’re building is a centralized registry of metadata about decentralized content, where the review and the metadata about the review are all metadata about a hash and in this case the hash is the identifier for some publication stored on IPFS.

Some things to ponder:

  • If you add the text of the review, and the metadata about the review, to IPFS then you get to treat that bundle of data as just more IPFS content
  • If you use IPLD to express the metadata you will get powerful benefits
  • If you persist the data in a distributed way using, for example, ipfs pubsub or a blockchain, then your entire peer review system would be distributed and serverless. except…
  • If you do all of those things then your only remaining point of centralization is the authentication tool – in this case you’re using github – and you could consider using something like uport or keybase to decentralize even that.
1 Like

@kosson @flyingzumwalt

FYI I’ve been trying to think about this more and follow up on some of @flyingzumwalt suggestions, and accordingly, I’ve been experimenting with combining blockcerts and IPFS.

I’ve a discussion thread going on the blockcerts discussion board that might be relevant to our discussions here. See http://community.blockcerts.org/t/creating-a-certificate-for-document-with-an-ipfs-hash/549/5

I’d love to hear your guys thoughts if you have any.

What’s your strategy against spam and other low quality submissions?

I think low quality submissions are an imperfection that doesn’t need to solved. If a group of people are chosen to decide what is “low quality” their bias would affect their decision and control what other people see.
Urban dictionary comes to mind when thinking about an ideal system where any and every submission is allowed.

urban dictionary doesn’t look to me unmoderated. I see very little advertising on urban dictionary and that wouldn’t be the case when urban dictionary would allow anybody to post everything.

Besides straight spam the website describes their policy as " Don’t name your friends. We’ll reject inside jokes and definitions naming non-celebrities. " That means that they do have criteria to reject content.

Urban dictionary also depends on voting where one user can’t easily give 1000 votes.

Fair point. The standards for a peer review site would have to be much higher than UD. Perhaps submissions that don’t receive more than 5 upvotes in 20 days will be automatically removed.

The problem is that you need to have mechanisms to enforce your standards. When you have users who often have a limited number of IP addresses available you can use certain ways of moderation that you don’t have with IPFS. There’s no way to block bots with Captchas. You can’t easily prevent a person from simply upvoting every one of their posts 5 times.

If a system like the above wants to scale in a way that it will get spammers who want to targeted it, it needs to think about a way to establish trust and prevent spamming.

Would a “downvote” button and “report spam” help? If a submission receives too many downvotes compared to upvotes, maybe 96% down and 4% up, it appears on a downvote list. An administrator would then remove all the comments on the downvote list.

Downvotes only work when you have a system that ensures that every person has one vote. A website like urban dictionary does that by setting up a limit of one vote per IP. Using IP addresses for trust isn’t perfect but it’s better than nothing.
With IPFS you don’t have the ability to use that model of trust.

A spammer could also simply flag every page out there as spam without there being the possibility to know that the same person made all the reports.

Sharing could be used as a filter to reduce the visibility of unwanted materials. Let’s say I post some content. Only 30 peers can see it. If it is spam they will down vote and not share it. It will effectively be in limbo. If it is valid they will each share it with peers that follow them and those peers will share it forward. The home page will only have content that has been shared enough times

That would be a way to go about the problem, but it’s a non-trivial problem to think about who should peer with whom and how to set up the trust relationships.

These thoughts are extremely significant to the future potential of IPFS technology as we approach mechanisms that enable what we might be able to someday call Collective Intelligence. This collaborative power will be vital to the future democratic platform that we will build to overcome government corruption and hold political actors accountable for their integrity.

I will continue to think philosophically, so as to be able to advocate for these possibilities in public spaces of discourse in order for us to grow our coalition in non-technical, intellectual and political communities.

1 Like