Mirroring standard websites to IPFS as you browse them

I’ve had an interesting question for a while. Note that if this is indeed possible, I advice attempting it with caution, as it may lead to what some consider copyright infringement. Here is my question:

We know how IPFS works when browsing websites in the network. You type the URL http://127.0.0.1:8080/ipfs/my_hash_here into your browser, the js-ipfs or go-ipfs node on your computer loads up the website, and in the process it temporarily stores every file on it and seeds it for other nodes.

What I want to know: Suppose you have js-ipfs or go-ipfs installed. You are browsing a normal website on the internet, like deviantart.com or youtube.com or twitter.com or whatever. Is it possible to teach your web browser to also add and seed copies of those websites in the IPFS network as you visit them? Oppositely, could the browser also learn to look inside IPFS for every file embedded on a website (image, video, etc)?

My idea is that if enough people had such a browser capability, they could essentially mirror parts of the existing internet onto IPFS as they visit them. This would obviously only work to a very light extent: You can’t mirror the PHP scripts or MySQL databases behind a given site, so obviously you couldn’t replicate the functionality its server offers. However you could replicate certain resources, which would namely be useful when browsing sites with a lot of images / videos / audio.

An example: Imagine your node automatically running “ipfs add” for every Youtube video you watch, transferring the source from Google’s servers into the network. When you then want to view that video, your IPFS node will additionally know to look for it inside the network and override the source so you can watch the video on youtube.com without even loading it from its server!

2 Likes

I think this could work using a SOCKS5 proxy that would simply hash and store every response and headers and return the stored responses based on the cache header or some other identifier.
This would be basically like creating a tiny wayback machine, but I don’t know if the “look inside IPFS” part would be so easy to implement (might be more trouble than it’s worth it)

2 Likes

Parts of such a mechanism do exist to some extent; however, I don’t know of a project that would wrap them in a single, easy to use package/tool.

Specifically, see: https://github.com/oduwsdl/ipwb (work in progress) — this project currently more or less supports “part 2 and 3” of your idea (“seed copies of websites [extracted from WebArchive .warc format] in the IPFS network” + “look inside IPFS for every file embeded on a website”).

For “part 1”, that is, scraping a browsed website into a .warc archive file, see e.g.:

1 Like

I’ve been tracking related discussions for some time, see notes and other threads linked at: