Ubuntu archive on top of IPFS

elopio · December 11, 2017, 5:51pm

Well, not GPG. We sign the packages when we are uploading them to the archive. Then the main archive generates hashes, which is what you will see here and in some other dirs:

http://archive.ubuntu.com/ubuntu/dists/xenial/

The mirrors of the archive just copy what the main publishes. apt in the client machine will download and verify the hashes.

elopio · January 7, 2018, 4:48pm

A quick update here.

After leaving my machine synchronizing the mirror during all my holidays, I finally have all the files locally.

I’m adding them to IPFS, which brought two unexpected problems. First, the recursive add will take a little more than 12 hours The Ubuntu mirrors are supposed to sync every 6 hours, which will not be possible with IPFS. We will have to sync it only once a day.
Second, ipfs stores local blocks for each file, so that duplicates the amount of space required for the mirror. The main mirror, instead of ~2TB will require ~4TB. As suggested before, the others can just sync the IPFS blocks so they will still require ~2TB.

I’m now going to package the transport into a PPA so it will be easy to install, do more tests with my local mirror and continue trying to find a server with better bandwidth, and experiment with bitswap to sync multiple mirrors.

leerspace · January 7, 2018, 5:23pm

It might be worth looking into using IPFS’ filestore functionality to prevent space requirements from being doubled.

elopio · January 7, 2018, 5:27pm

hey, that sounds great! That way we also help testing that experimental feature. /me gives it a try.

koalalorenzo · January 7, 2018, 5:33pm

As said can use the filestore.
If somebody ends up here, here is the shortcut:

ipfs config --json Experimental.FilestoreEnabled true
ipfs add --nocopy $LOVE

It should work

elopio · January 7, 2018, 9:09pm

No luck with the --no-copy, I got:

panic: interface conversion: interface {} is cmdkit.Error, not *coreunix.AddedObject

goroutine 37 [running]:
github.com/ipfs/go-ipfs/core/commands.glob..func7.1(0xc4202ec060)
        /cwd/parts/ipfs/go/src/github.com/ipfs/go-ipfs/core/commands/add.go:390 +0x9fc
created by github.com/ipfs/go-ipfs/core/commands.glob..func7.2
        /cwd/parts/ipfs/go/src/github.com/ipfs/go-ipfs/core/commands/add.go:449 +0xc7

I will report the bug and try to dig into it later. For now, I will use the command with copy.

Update: the problem is not caused by --no-copy. I reported the bug: https://github.com/ipfs/go-ipfs/issues/4555

elopio · January 10, 2018, 12:39am

Do you know if there’s a way to resume the ipfs add --recursive? When it fails and I have to re-run it, it seems to start from scratch.

leerspace · January 10, 2018, 1:40am

I don’t think it stores the progress anywhere to be able to pick up where it left off. Subsequent runs of ipfs add should be significantly faster (probably not much of a difference with --nocopy though) since the blocks up to the point where it stopped already exist in the datastore and don’t need to be written.

stebalien · January 10, 2018, 10:46pm

In terms of add time, a large portion of that is:

Our current datastore is slow. If you want something faster, try badgerdb. Unfortunately, that datastore backend is experimental for a reason.
One of our huge bottlenecks is telling everyone on the network that you have a file. We’re working on making this better but it’s a bit of a fundamental problem.

One partial solution is to:

not announce to the network that you have the files (by adding them with ipfs add --local).
Have the APT backend connect to known IPFS mirror peers.

Unfortunately, that’s not very decentralized… (although the Ubuntu installations that use this backend will still announce the files they have to the network).

Ideally, we’d announce the root nodes of all pinned files to the network but we won’t be able to do that for a while.

elopio · January 18, 2018, 3:37pm

Thanks for the suggestions @stebalien.

We are trying now with --local, and then we can experiment with badgerdb.
It says it will be done in 3.5 hours.

Edit: This progress bar is the worst liar, it has been running for a long time and it says now 58.85% 4h44m58s

elopio · January 19, 2018, 4:51pm

We were left in a weird situation. This morning it was reporting less than 30 minutes to complete the add. But for some reason, the server got stuck, our byobu session was killed and all the IPFS processes stopped.

So, now how do we know if the add was completed? It would be sad to have to run the full recursive add again, because the files in the directory have not changed. But we have no clue if that’s the only way.

leerspace · January 19, 2018, 5:51pm

I don’t know how many things you have pinned to your node, but if it’s not too many you could look through the results of ipfs pin ls --type=recursive to see if any of the pins are for the content you were adding. By default ipfs add will recursively pin the content you added.

If you know what you’re looking for you can also search through the folders/files in the top-level hash using something like

ipfs pin ls --type=recursive -q | xargs -L 1 ipfs ls | grep "fubar"

rngkll · January 19, 2018, 10:55pm

Now we are running the add without the daemon running, and we get a lot of process running, with the daemon we get 2 or 3 tops.

stebalien · January 26, 2018, 9:44am

Unfortunately, I believe that’s the only way. We have to re-read and re-hash the files to verify that they exist in the repo (we assume that they may have changed although we could probably relax this constraint for the filestore). Note, we won’t (or shouldn’t at least) actually write them to the repo again.

One way to avoid this would be to use MFS and add the files one-by-one. That is,

#!/bin/bash

set -e

FROM="$1"  # local directory
TO="$2"    # directory in MFS

find "$FROM" -type f -readable -o -type d -readable -executable | while IFS= read -r -d '' fname; do
    if [[ -d "$fname" ]]; then
        ipfs files mkdir -p -- "$TO/$fname"
    elif [[ -f "$fname" ]]; then
        if ! ipfs files ls "$TO/$fname" 2>/dev/null; then
            # will be pinned in the next command (you should probably disable GC)
            cid="$(ipfs add --pin=false --local -q "$fname")"
            ipfs files cp -- "/ipfs/$cid" "$TO/$fname"
        fi
    else
        echo "not a file or directory: $fname" >&2
        exit 1
    fi
done

Note, that script is rather slow… a better one would list the directory you want to import, the target MFS directory, find the diff, and then add the files in batches. However, writing that script is a bit of an endeavor.

I’ve opened an issue to discuss adding a command to do this to ipfs:

FYI, the next release should make this a bit better. We figured out why adding large datasets causes problems with go-ipfs (we had a leak that has been fixed).

jbenet · June 21, 2018, 11:17am

Hey @elopio – can i get access to the archive? (or how to make it) i’d like to run some tests with rabin fingerprinting + badger ds, + see if it would make sense to write a custom importer to import the archive smartly, deduping all the internal files that are the same (look into ipfs tar for a preview of what i mean).

elopio · June 21, 2018, 3:10pm

Hello @jbenet,
We haven’t been able to ipfs add the full archive, it always fails on 99%. We have reported most of the errors we get. We were thinking to try to add a subsection, to see if that works and lets us test further.

This is the script we are using to sync the repo and add it to ipfs: https://github.com/JaquerEspeis/apt-transport-ipfs/blob/master/scripts/sync_mirror.sh

Please let us know if there’s something we can do to help.

postables · June 22, 2018, 10:35pm

If I may throw in my two cents, if part of the issue that you’re having is uploading a very large tarball to IPFS, perhaps try fragmenting the archive, and storing the various pieces? You can then use client-side logic to reconstruct the various fragmentted archive pieces into a single archive.

DaniellMesquita · December 4, 2019, 5:35am

it would make sense to write a custom importer to import the archive smartly, deduping all the internal files that are the same (look into ipfs tar for a preview of what i mean).

Just a alt importer? It should be used in IPFS itself. More than that, IPFS could be compatible with git/pijul: using common objects instead of whole files. Told that when a user suggested both pip, rpm and deb: Software Repository Mirrors may be a good start for IPFS - #3 by MirceaKitsune

DaniellMesquita · December 5, 2019, 10:18pm

I have a improved concept for deduplication: (draft) Common Bytes - standard for data deduplication

yeehi · October 2, 2020, 5:09am

This is a brilliant idea, elopio!

@lidel might be able to assist you: Alt-Svc

github.com/ipfs/in-web-browsers

Alt-Svc (HTTP Alternative Services)

opened 08:16PM - 29 Mar 19 UTC

lidel

topic/origin topic/http-gateway

`Alt-Svc` is an Internet Standard ([RFC7838](https://tools.ietf.org/html/rfc7838…)) which allow an origin's resources to be authoritatively available at a separate network location, possibly accessed with a different protocol configuration. #### TL;DR > The idea of `Alt-Svc` is for a website to be able to tell a client _"For technical reasons you don't need to care about, please talk to me using [this other web address]._" > > The client optionally does so. (They don't have to.) If they do so, they *do not* change the address bar or give any sort of visual indication to the user. > – [src](https://trac.torproject.org/projects/tor/ticket/21952#comment:31) ## Potential IPFS Use Websites could announce they are available over IPFS in a way that does not require additional DNS lookups. #### /ipfs/ ```console $ curl -s -I -X GET https://bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq.ipfs.dweb.link/ | grep -i Alt-Svc Alt-Svc: ipfs="bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq"; ma=315360000; persist=1 ``` #### /ipns/ ```console $ curl -s -I -X GET https://wikipedia.org/ | grep -i Alt-Sv Alt-Svc: ipns="wikipedia.org"; ma=3600; persist=1 ``` Pros: - Location bar kept intact (same Origin!) - Following existing standard - Prior Art exists, Tor Browser will use .onion address if announced by a website - Enables smooth upgrade from HTTP to IPFS transport - Can be cached HSTS-style - No DNS TXT lookups Cons: - Location bar kept intact (needs additional indicator that IPFS was used) - Requires initial hit to HTTP server - Relies on native support in browser itself - There is no API for WebExtension to register itself as a handler ### References * [tools.ietf.org/html/rfc7838](https://tools.ietf.org/html/rfc7838) * [mnot.net/blog/2016/03/09/alt-svc](https://www.mnot.net/blog/2016/03/09/alt-svc) * [trac.torproject.org/projects/tor/ticket/21952#comment:31](https://trac.torproject.org/projects/tor/ticket/21952#comment:31) * Prior Art - Tor Browser supports `Alt-Svc` - Brave planning to support it as well (https://github.com/brave/brave-browser/issues/1121) - Website Announcing it is Available over Tor ```console $ curl -s -I -X GET https://tor.cloudflare-dns.com/ | grep -i Alt-Svc Alt-Svc: h2="dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion:443"; ma=315360000; persist=1 ```

There is a way to indicate that a file is available via http or some other protocol, like TOR or IPFS: Alt-svc

Alt-Svc is an Internet Standard (RFC7838) which allow an origin’s resources to be authoritatively available at a separate network location, possibly accessed with a different protocol configuration.

TL;DR

The idea of Alt-Svc is for a website to be able to tell a client "For technical reasons you don’t need to care about, please talk to me using [this other web address]. "

The client optionally does so. (They don’t have to.) If they do so, they do not change the address bar or give any sort of visual indication to the user.

If you contacted Ubuntu (and Debian GNU/Linux!) and asked them to support this protocol in the repos, everybody would benefit. IPFS users would have faster downloads. HTTP users would experience less load.

Topic		Replies	Views
Software Repository Mirrors may be a good start for IPFS Ecosystem use-cases-and-apps	9	2009	October 2, 2020
Dpip - Python Package Index (pypi.org) on IPFS Ecosystem	4	839	December 5, 2019
Measurement-based research Paper: "Mapping the Interplantery Filesystem" Ecosystem	0	372	February 19, 2020
Feedback on IPFS Gateway at cloudflare-ipfs.com Ecosystem go-ipfs	7	1805	September 4, 2019
Rebooting the IPFS Awesome List IPFS Thing 2022 community , awesome-ipfs	10	633	September 16, 2022

Ubuntu archive on top of IPFS

TL;DR

Related Topics