Minerva: Build the Hadoop-Hive on IPFS

Hi

There still lacks a big data system on IPFS. So we built Minerva, which could be regarded as the Hive on IPFS. Using Minerva, you could use standard SQL to query the file content on IPFS (json, csv format).

Minerva is based on Drill and IPFS. Technically, it’s a Drill storage plugin that connects IPFS’s decentralized storage and Drill’s flexible query engine. Any data file stored on IPFS can be easily accessed from Drill’s query interface, just like a file stored on a local disk. T

The basic idea is very simple: run a Drill instance along the IPFS daemon, and you can connect to other users on IPFS who are also using Minerva. If one of the users happens to have stored the file you are trying to query, then Drill can send execution plan to that node, who executes the operations locally and returns the results back. Of course, other users can benefit from your node as well, if you are sharing the data they want. If there are enough people running Minerva, data sharing and querying can be made distributed and more efficient!

If you are insterested, we have made a few slides that explain the ideas in details:

Any suggestion is welcome. :slight_smile:

Find the code on GitHub: https://github.com/bdchain/Minerva

A live demo: http://www.datahub.pub (may be unstable please bear with it)