Is 100gb sequential writes with a seeded non-ipfs host on 80 servers pulling posible?

From @gerrickw on Wed Mar 30 2016 21:07:31 GMT+0000 (UTC)

Attempting to see if my usecase is possible, methods to make it possible, or if this would be a bad solution. This question will be multiple, but towards the same goal.

Usecase: I would have 80 servers attempting to download a 100gb file as quickly as possible in an intranet environment.

Questions:

  1. While downloading is it possible to force sequential writes for spinning disks with a 1gb memory limit (so not a large amount of room for disk cache)? As the disks are not ssds, sequential downloads will help IO on the disks.
  2. Is a 80 server swarm possible where each server is pulling from any of the 80 servers if they have already downloaded that chuck of data from a single seed? So all 80 servers start downloading at once, I would hope they would spread out their download load between servers.
  3. Is it possible for the 80 servers to start the download from a seed that does not have ipfs installed? Such as if it is hosted on http, https, or a hadoop connection? Or if some metadata files such as checksums, blocks, etc can be generated beforehand.
  4. If number 3 is not possible, can a host add a file to ipfs and before it is finished adding can others start fetching?

Thanks for any help people can supply – even answering a single questions will be helpful – Thanks


Copied from original issue: https://github.com/ipfs/faq/issues/105

From @whyrusleeping on Wed Mar 30 2016 22:37:03 GMT+0000 (UTC)

1. I’m not super certain what youre meaning by sequential writes. Do you mean fetch each block in order?
2. Yes, that is possible and how things should happen. Currently bitswap isnt the smartest protocol so there may be some wasted bandwidth going around (nodes receiving the same block from multiple other peers). But moving forward with better bitswap strategies will improve this dramatically.
3. No. content requested through ipfs needs to exist in ipfs before (or after) its requested. Ipfs cant pull content from an external source automatically. (yet?)
4. Currently we cannot do ‘streaming’ adds, since the ipfs datastructure is a merkledag, you need the entire structure before you can know the hash of its root. you could however manage to stream each chunk hash out of band to the other servers to start the fetch. This functionality is planned, but not yet implemented.

From @gerrickw on Wed Mar 30 2016 23:30:26 GMT+0000 (UTC)

> 1. I’m not super certain what youre meaning by sequential writes. Do you mean fetch each block in order?

Correct, or at least somewhat in order and once enough data downloaded in order for a write operation to occur. This would be better optimized for spinning media than jumping sectors. I suppose this would also be similar while streaming video.

  1. No. content requested through ipfs needs to exist in ipfs before (or after) its requested. Ipfs cant pull content from an external source automatically. (yet?)

Seems like this would be an easy way in an Internet environment for a user to completely move over to using ipfs standard if it supported a http protocol. Example: Start local daemon, browse websites, daemon notices it does not have some cdn content and start downloading and then serve to the user normally. Others in your swarm could then pull from your daemon of that cdn content.

  1. Currently we cannot do 'streaming' adds, since the ipfs datastructure is a merkledag, you need the entire structure before you can know the hash of its root. you could however manage to stream each chunk hash out of band to the other servers to start the fetch. This functionality is planned, but not yet implemented.
    

Oh interesting, is there a github ticket I could follow for this functionality?

Thanks for your help