How to modify small sections of a large file stored in IPFS?

AFAIK, it is impossible modify a file stored in IPFS.
Technically, we can re-upload the file to the IPFS to simulate the “modify” operation.

But the problem is whenever we want to modify a small section of a large file, say 10GB. We have to re-upload such a large stuffs and waste many computation resources on duplicate blocks for just few new added blocks. Although with the help of data deduplication can we save huge spaces from duplicate blocks, we still suffer from time wasting operations that calculates hash of duplicated blocks.

And there is an awesome application called peergos.

As peergos described on their official page (Features):

Peergos can handle arbitrarily large files efficiently. Our maximum file size is far bigger than any other storage provider we are aware of (assuming you have enough space on the server). We can stream large files like videos and start playing immediately, or quickly skip though to a later part. Despite being end-to-end encrypted, we can efficiently modify small sections of large files.

They claimed that peergos can modify small section of large files efficiently.

Does anyone gets any idea on how they implement it or any good idea to solve the problem described as above?

Seems that I found how the document which tells how they handle this.

Below is the link to the page that how peer go modify small sections from a large file.

IPFS chunks your files into blocks and builds a Merkle-DAG to content-address them (https://www.youtube.com/watch?v=Z5zNPwMDYGg).

I don’t know what Peergos does, but “editing” a file efficiently would involve keeping track of which blocks you modified, and adjust all the DAG nodes/branches affected up to the root.

1 Like

Hi hector,

Thanks to your fast response.

If I understand it correctly, It is not yet available on go-ipfs to do such a modification, right ?
In current stage we can only re-upload the file unless we implement it by provided low level API. Which means that we still have to walk through the whole file and calculates the final hash value of the file.

Well, you can mount IPFS-MFS as a filesystem and you can modify files there as you wish. Other than that, using the HTTP APIs, it would be moderately painful. Writing a program in Go to write some contents given a root hash and an offset might be the saner way of doing things.

Peergos says they’re chunking in 5MB blocks, and the challenge with any chunking algorithm is that if you insert (for example) one single byte at the very front of the file, then the hash of every chunk changes (all of them shifted by a byte), and destroys the ability to reuse chunks, and it’s the same effect as duplicating all the storage of an entire file (because you get all new chunks).

I think I read somewhere that RSYNC has some kind of intelligent way of choosing their chunks, that isn’t simply at the boundary line of every chunk block size (but uses the data itself to intelligently choose block boundaries), and is based on the known need to reuse blocks.

So the key question is can IPFS do this kind of intelligent chunk selection? …because if so then theoretically it could be efficient in modifying large files and accomplishing also the equivalent of RSYNC where only small data transfers are required to sync large files/folders.

Ipfs can use Rabin and buzhash chunkers (--chunker option in ipfs add). These are meant to do that: magically find the right block boundaries to increase deduplication if possible. Of course, depending on the input, they will work better or worse. When using them, blocks will not be of a fixed size as by default.

That’s awesome news to hear Hector! Thanks for clarifying. I was researching recently how MFS could be made to ‘simulate’ or ‘accomplish’ something approaching the efficiency of an rsync and so it’s good to know this might work well.

How do I do that? I could found nothing regarding it anywhere. Other than it currently unimplemented:

I was wrong, ipfs mount mounts a read-only filesystem only, so interfacing with ipfs files commands might be the best.

Thanks. By the way, I requested the write support status and here was it’s author @djdv answer: