Ipfs file sharing

Hi… I’d really appreciate your inputs on this.

Question 1.

I upload file(200MB) to ipfs. It gets stored locally on my machine (node0)

Another node(node1) asks for this file with root hash. How does the file sharing work ? does my node(node0) share chunk by chunk ? meaning that it’s possible that if I turn off my node1 before it fetched the whole file and turn it on, it’s going to have some chunks stored on its cache, but not all the parts ?

Question 2.

It seems like that If file is small and one node fetches the file from another and i let it do it till the end, both of the nodes are going to end up storing the whole file. If that’s true, it means that if I upload something to ipfs, then another node whoever requests can see the file and what’s in it. Isn’t this a problem ? i read somewhere that with filecoin, it doesn’t store the whole file at one node ever because it doesn’t want any node to see the whole file…

Question 3.

If I upload a movie to ipfs, and then, on another node, I request it, will it automatically start showing the movie(as in it shows the movie and also downloads more and more chunks ? ) If that’s true, I guess, file sharing works getting chunks SEQUENTIALLY ? correct ? I think that’s how torrent works too.

#1. Yes, Node1 will get stuff one chunk at a time.

#2. Encryption is how you keep data private if needed

#3. I’m pretty sure if you open a file and start streaming it’s data in, it can start immediately without downloading for example chunks from the tail end of the file etc, but there may be no guarantee that it’s also trying to load any other parts it can find in the background as they’re found.

@wclayf Thanks a lot

  1. Yes, but if I encrypt, and whichever node receives a file and knows how it was encrypted, then it can use the same algorithm to decrypt it. Not an ideal, but much better…

  2. So what you mean is that it will download chunk by chunk sequentially as the stream goes on and on, but it might also download some chunks from the middle of the file, correct ?

On #3, I don’t think there’s any guarantee on the behavior is what I’m trying to say, and to assume any behavior in your own designs or assumptions would be a mistake imo.

@wclayf So this means that i can’t be sure if it’s at all possible to stream any movie with ipfs directly… correct ?

You should be able to stream just fine, although it will be up to your gateway how much bandwidth and storage you have of course. I don’t know how most pinning services deal with large video files, but the IPFS system itself can handle streaming just fine.

@wclayf

This is what you mentioned which currently doesn’t make sense to me anymore, sorry.

So the assumption that it starts downloading and streaming chunk by chunk is correct. I don’t know what guarantees you’re mentioning.

IPFS might be reading the head of the file and the tail end of the file at the same time. That’s all I was saying. Once it starts accessing any part of a file, it may run some other deamon thread in the background to get many other parts of the file too. You don’t know, and shouldn’t care.

If I upload a movie to ipfs, and then, on another node, I request it, will it automatically start showing the movie(as in it shows the movie and also downloads more and more chunks ? ) If that’s true, I guess, file sharing works getting chunks SEQUENTIALLY ? correct ? I think that’s how torrent works too.

There are 2 modes to link chunks together (--trickle option in ipfs add)

  • Without (merkle-dag) the default, creates a well rounded format that is well suited for random access and highspeed download.

    (Here Qmfoo is the hash you are gonna share to other users, Qmbar,Qmfaa and Qmfee are middle blocks, similar to Qmfoo, all other blocks (the bottom ones) are actually containing different shard of the file)
  • With (trickle-dag) it will line up thing, you should have a slower initial lag while loading the head of the file, but you should also expect atrocious random performance, also note that this way of storing the file has currently a speed limit, merkle’s speed is exponential the more block fetched, the more you can fetch (until you reach data stores, trickle’s speed is constant, fetching a middle block only reveal one new middle block). However there will be less listings to store and the overall size would be smaller:
    index
    (Here Qmdoo is the hash you would be sharing, also note that even the data is stored using the same underlying blocks, the final hash is different)

The overhead is way less big than what my example is showing, I don’t think there is any reason to use trickle unless you really need to save a few Kb on multiple Gigs files since merkle has exponantial theorical speed (btw the speed difference may be partially fixed in the future by graphsync, but it’s still WIP).
Also a chunk is 256Kb or less, so if your file is <256Kb it isn’t even chunked and just used as is.

sharing works getting chunks SEQUENTIALLY

This depends, I think it’s what is currently done with ipfs cat but if you use partial content on a gateway (such while watching a movie in your browser on a gateway) you can lazy load part of a file on demend (like streaming sites do, only loading the few next minutes). In theory it would also be possible to download in a random order if you want (like most torrent implementation do), (this is maybe even what ipfs get and ipfs pin do but I’m unsure).

i read somewhere that with filecoin, it doesn’t store the whole file at one node ever because it doesn’t want any node to see the whole file…

Idk where you read this but it’s just wrong, sending only parts of a file isn’t an effective way to protect against that (encryption is) as even if you receive half of a file, this half might still contain keys, passwords or general data you don’t want other to see. The actual reason filecoin is spliting file is so if a node dropout you loose way less of your data (filecoin also uses parity to overstore files, so filecoin can stores like 3 copy of the file (including parity) and so you can loose 2/3 of your nodes and still recover the initial file).

Yes, but if I encrypt, and whichever node receives a file and knows how it was encrypted, then it can use the same algorithm to decrypt it. Not an ideal, but much better…

This depends of what you do.
IPFS shares plain files without carring too much about there content or what you do with thoses so it’s up to you and how you encrypt them.

You should probably encrypt using a keyed algorithm (encryption with password) such as AES.

Example with openssl on linux, I’ll encrypt the file test with the password test and add it, then fetch it and decrypt it (note that crypto is hard, and there are many issues with this example, I wouldn’t use this in the wild like this, you could leak part of your key or greatly simplify bruteforcing) :

$ cat test | openssl enc -aes-256-cbc -pass pass:test | ipfs add --pin=false -Q -
QmV3nWXuS3TgsWMkwUNSx4MVN4z4rBagdqBtN2fhMERvjp
$ ipfs cat QmV3nWXuS3TgsWMkwUNSx4MVN4z4rBagdqBtN2fhMERvjp | openssl enc -d -aes-256-cbc -pass pass:test
Hey :)

If this was done correctly, you could change the password to something long that no one knows and even if they were able to see that it’s AES CBC, they wouldn’t be able to do anything without the key (obviously the key must be shared OOB (out-of-band, so not publicly on IPFS, the key is private and should stay that way)).

So the assumption that it starts downloading and streaming chunk by chunk is correct. I don’t know what guarantees you’re mentioning.

It might or it might not, it’s up to what you do with it.

You can do whatever you want depending of what you need (sequential can have less overhead and have faster delay to meaning full data). Random is faster and more reliable.

If you just want to stream movies on top of IPFS, that already works perfectly. You just need a browser that use partial content on top of a gateway and this already work (such as firefox, and likely chrome too).

Just open any video file in a gateway or using <video> and you will be lazy loading part of the file (streaming it). If you want to do ipfs in the browser, I’m not sure, I know very little about ipfs-js.

Extra notes

Movie codecs aren’t sequential, they are mostly random (depending of what you use this change, but the simplest of them usualy start by a header with some metadata, then a video track, then an audio track (or both swapped)) so while playing a video you will read a tiny bit of data in the start, then start reading 2 tracks at different places, so it’s mostly random accesses (also note that scrubbing in the timeline will make that fully random), some more complex codecs and container are even more random. (thus making merkle-dag better for most of them)

2 Likes

@Jorropo Thanks a lot for such huge explanation.

quote=“Jorropo, post:9, topic:11404”]
gateway
[/quote]

To sum up, you mention that watching a movie means reading chunks sequentially, because otherwise it wouldn’t be possible to watch a movie in a stream on IPFS, but I think you also mention that while it’s downloading chunks sequentially, it also downloads some other chunks in other parts of the video as long as it can. WDYT ?

you mention that watching a movie means reading chunks sequentially

I was trying to say the opposite. The reality is it depends, some formats are more sequential than others.

I think all of this sequential VS random doesn’t matter and you shouldn’t care about it.
The reality is unless you want to save a few Mb on a Gigs files (so saving ~0.01% of the file size), you should use the merkle-dag because it is capable to do both sequential and randoms access very well (at the cost of a ~0.01% of bigger file size), since it do both correctly there is no need to worry about anything. (while sequential (trickle-dag) isn’t capable to do high speed transfer and is atrocious at random access)

reading chunks sequentially, because otherwise it wouldn’t be possible to watch a movie in a stream on IPFS

IPFS manages random access just fine (at a slight latency cost). If the file you are downloading has been chunked with merkle-dag (which is the default)
IPFS also manages sequential access just fine too.
I guess I should have been more clear, but all of this doesn’t matter, IPFS is good in both cases :slight_smile:

it also downloads some other chunks in other parts of the video as long as it can

Your browser will be doing this, IPFS is then just a slave to your browser.
In practice your browser is gonna ask for bytes from 150000000 to 180000000 and from 100000 to 150000 at the same time, IPFS is then gonna do his best to provide thoses needed bytes, only fetching the needed chunks from other nodes (in my example all blocks between 150000 and 150000000 would just not be downloaded).

1 Like

@Jorropo

I think, I said the same thing.

So what I mean is that when opening a movie file, it starts downloading chunks sequentially + also randomly too. If it doesn’t do sequentially(whether it’s browser asking for sequential bytes or ipfs doing this), it still has to happen sequentially otherwise watching a movie in a real time wouldn’t be possible. but the idea is that it also downloads other parts of the file (it could be in the end, in the middle or whatever - since it’s random).

Agreed ? or am I still misunderstanding something ?

Yes, seems fine to me :slight_smile: