IPFS conceptual questions from a semi-noob

Iā€™ve been intrigued by IPFS for the last few days after stumbling on a Hacker News article that mentioned it, and have watched a couple talks and took a deeper dive experimenting with it myself today.

Had some overall conceptual questions that I havenā€™t found clear answers to, so thought Iā€™d ask some of the insiders here.

  • Is IPFS intended for anything beyond static content delivery? Iā€™ve heard a lot of talk of it heralding ā€œthe decentralized web of the futureā€, ā€œWeb 3.0ā€ and the like, but I canā€™t see technically how it could fill the function of most modern dynamic websites. The top visited websites at the moment include search engines, social networking apps, wikipedia, Amazon, etc., most of which are heavily database and search driven. Querying a database of even a few terabytes would seem impractical; certainly you wouldnā€™t store unique files for every state / modification of the database and push those out. Even index trees and the like would be impractical to spread out and synchronize. Iā€™ve seen some mentions of orbit-db and pubsub and the like, but the explanations and descriptions seem a little handwavey and esoteric (possibly just due to my ignorance) and it doesnā€™t look like there are proven scalable examples using them. So I totally see the static content scenario with IPFS, but 90% of the web nowadays is about managing, searching and modifying huge data sets, and Iā€™m still scratching my head a bit on that. When all you need is one record, do you need to transfer the whole database or a hefty search index?

  • Are there mathematical models of how the scaling works out? If BitTorrent is any indication, only the most popular torrents are reliable and have enough seeders, and thereā€™s a vast array of more esoteric stuff having too few seeders to really get to. Will only the most viral static content benefit from the bandwidth-saving get-it-from-your neighbor effect? Will more esoteric stuff take ages to load and contribute to its continued unpopularity? I know the plans are to incentivize hosting with cryptocurrency, which sounds promising, but Iā€™m curious how the sheer math of the scaling works, in an asymptotic O(N) sense, especially when every distinct version of a ā€˜fileā€™ gets stored under its own immutable hash stamp.

  • On that note, is there a form of delta compression at all? If you were implementing a chat app, say, where each keystroke a group of people typed was appended to the end of the file and synchronized out to the swarm, does that create a distinct hash-name/file for every keystroke? Are those grouped at all on a block level or does the total data size scale at an O(N^2) rate? What about inserting bytes in the middle of a file?

  • One last thing, is there much safety really in preserving data permanently, especially unpopular stuff? If the originator drops off the network, and hasnā€™t paid someone to pin it etc., and that piece of content isnā€™t in demand at that moment, couldnā€™t it be lost forever just as with the current web?

Thanks in advance, and apologies if this seems unfair or critical. Itā€™s stuff Iā€™m genuinely curious about. Even in the static content scenario, the idea of an efficient grassroots CDN for sharing data that donā€™t require you to set up and pay for a web host etc. is pretty exciting.

2 Likes

Your one post has so many various questions in it Iā€™m not going to attempt to answer all of them, but Iā€™ll try to answer some. It sounds like youā€™re looking for insider input, so I should probably note that Iā€™m not an insider.

My impression is yes. See peerpad for a more simple example of using IPFS for collaborative document editing. A lot of your concerns seem more concerned with scaleability, which Iā€™m not sure has been tested very much at this point. The implementations are still under heavy development and from what I can tell there are significant changes in progress that would affect IPFSā€™ scalability (e.g, bitswap sessions).

One of the problems with BitTorrent that is less of an issue for IPFS is that shared files in multiple torrents cannot be shared across different torrent swarms. If I go to download a torrent that has no seeds but thereā€™s someone else seeding the same content in a different torrent, I cannot download the files from the other person. With IPFS I wouldnā€™t expect this to be as much of a problem since I can retrieve a file from anyone who has it, not just people with the file within a specific swarm.

If youā€™re talking about Filecoin, my understanding is that contracts are not indefinite. Unless someone is deliberately hosting something without payment Iā€™d expect older versions of files to drop out of use over time.

Letā€™s say that one person is running an IPFS node with a rarely accessed file. Someone else decides they want that file and wait for it to download from the first peer; now there are two peers who can provide the file and subsequent requests for the file should be faster. Iā€™m not sure how this would contribute to something not being popular.

Sure, but thereā€™s also a chance that someone else could come along with that same file and add it in the same way. Then all of those broken IPFS links now work again and nobody had to make the content accessible at some specific URL.

3 Likes

Thanks for the replies. Curious to dig into peerpad a bit and try to understand how it works.

Re: bittorrent scaling, I imagine the content-addressing would help a bit, but youā€™d still have vast amounts of differing content (even torrents themselves differ in their intro readmes etc. so if theyā€™re packaged as single IPFS ā€œfilesā€ it seems youā€™d have trouble). Seems like absent some sort of delta compression or block-level deduplication this would create a content explosion and get out of hand quickly.

Re: popularity, what I was thinking is that if you want to watch e.g. two videos, and one is as-yet-obscure and takes 20 seconds to start (you might give up on it before that) and one is mega-popular and starts instantly, that will create a sort of positive feedback loop for the video thatā€™s already popular.

Re: permanence, sounds like that part is no guarantee really, any more than the current web - although like you said it could come back at a later date if a peer comes back online (as with an obscure web server).

IPFS does do block-level deduplication. Blocks that are shared between directories or files are only stored once (per node). If you add a local directory to IPFS, add a file to the local directory, and then add the local directory again, only the new file will need to be added to IPFS.

Right, thereā€™s no guarantee. The difference with http is that anyone can make content available again at its original address with IPFS. With http it has to be the original publisher (or someone who can gain control of the original domain if the original publisher goes out of business or something) and it has to be made available at the same exact address again in order for original links to work again. It seems like there are a decent number of conventional http sites that canā€™t even keep their URL schemes consistent over time so that links to their content donā€™t break even when theyā€™re still maintaining their site.

2 Likes