Why are files stored in base64 for IPLD Git?

I am working on an app for distributed proofreading. Git in IPFS seems like a perfect solution. It would let users work on a document collection with whatever tools they like and then commit changes back into the swarm.

The ereader I am using, Readium, has the ability to ingest “exploded” ebooks which is simply the unzipped contents of an epub file.

The file structure produced by git-remote-ipld aligns with the spec and blobs are stored as base64 with a size header. So there is a dag entry refs/heads/master/tree with all the files, just in a format I can’t read.

My ideal piece of software would allow users to check out the repo and also for Readium to browse the contents of the head branch.

The data is already there in the current structure just encoded so a browser can’t read it. Is making it accessible reasonable?

Also, has git been removed from recent versions of ipfs-js? When I run my code with v0.34 it works, but with v0.42 I get No resolver found for codec "git-raw". With v43 of the HTTP client I get Missing IPLD format "git-raw".

I’ve been reading through the code and if I understand correctly the reason for the preface and base64 encoding is that is required to get the same hash that git is using internally for the block.

This is also one of the big roadblocks as the file can’t be sharded or the hash will change.

Yes, the current limitation is 2 MB internally and 1 MB via ipfs add etc. for one block.

The reasoning behind this is, that you can do DoS with too large blocks: If you send wrong data to a node it has to invest a lot of processing time just to drop the data again and redownload.

This means you cannot import a git which has large files. Or commit large files with git.

You can of course use git to store just pointers to files, just having basically a text file with a CID in it, which points to the file.


Turns out someone is currently working on experimental git-lfs support:

I’ve been digging into git-remote-ipld and it has a feature where large objects are stored a regular IPFS filesystem with the hash as the filename.

I took that concept and put all blobs in the filesystem with the blob *size*\x00 header removed.

That lets me generate a *hash*/content/ folder with the current state of master.

There’s a problem though with using the git file structure at all: pinning. The git objects are inserted using BlockPut which isn’t pinned. More importantly, when someone else pins the root hash, it will only pin up to the root commit. None of the associated files will get pinned since the structure is opaque to IPFS.

It seems like it would make more sense to just put the entire object structure into a regular file system and use the git SHA1 hashes as uids. No need for IPFS to understand git at all for push and fetch to work as well as pinning.