Distributed Wikipedia Mirror Update | IPFS Blog & News

Browsers with built-in support for IPFS addresses (Brave (opens new window), Opera (opens new window), or a regular Firefox (opens new window), Chromium (opens new window) with IPFS Companion (opens new window)) can now load the latest snapshot using DNSLink (opens new window):


This is a companion discussion topic for the original entry at https://blog.ipfs.io/2021-05-31-distributed-wikipedia-mirror-update/

The web platform I’ve developed called Quanta (https://quanta.wiki) is uniquely positioned I think to be both the search engine as well as the browsing interface (and even editing and upload interface) for a new Distributed Web3.0 Wikipedia. It’s Federated, supports IPFS, and is for the most part a “completed” project/app. It can do industrial strength high-performance full-text search using MongoDB (i.e. Lucene) as it’s data storage (in addition to IPFS).

For years I’ve planned to load the DB with Wikimedia data to show what it can do as a competitor for Wikipedia. I may now look into ZIM files. Are ZIM files the standard Wikimedia file format? Standing up a new Wikipedia on Quanta is basically just a matter of importing the data. Quanta expects content to be plain text or markdown however so admittedly getting the formatting right for the articles would be a separate task.

Here’s a link:

Quanta has been mentioned in a previous IPFS Newsletter, but for some reason it hasn’t yet been added to their “Ecosystems” page, despite it being far larger and feature rich than probably any other project listed. Quanta is a ‘general purpose’ content platform, with a unique set of features and a powerful design unlike any other existing platform.

The openZIM project was launched by Wikimedia CH (Switzerland) and is actively maintained by Kiwix and supported by the Wikimedia Foundation.
https://wiki.openzim.org/wiki/OpenZIM

Thanks @lidel Do you happen to know where I can download a subset of wikimedia (rather than the 100GB) download, to get for example only the field of “Physics” (some any small niche genre of documents), to use as content for a small technology demonstration? Preferably in JSON format.

I am not aware of JSON version, however there are ZIM files for subsets of English Wikipedia at Index of /zim/wikipedia.

Each ZIM is available in three versions:

  • maxi: the default full version.
  • nopic: full articles, but no images. About 75% smaller than the full version
  • mini: only the introduction of each article, plus the infobox. Saves about 95% of space vs. the full version.

The ones you may be interested in are named wikipedia_en_physics_*.zim

As far as I could determine ZIM files are a proprietary format for a specific reader app called Kiwik. I know the topic-specific XML/JSON files are out there, because I’ve downloaded the physics one in the past.