I’ve been intrigued by IPFS for the last few days after stumbling on a Hacker News article that mentioned it, and have watched a couple talks and took a deeper dive experimenting with it myself today.
Had some overall conceptual questions that I haven’t found clear answers to, so thought I’d ask some of the insiders here.
Is IPFS intended for anything beyond static content delivery? I’ve heard a lot of talk of it heralding “the decentralized web of the future”, “Web 3.0” and the like, but I can’t see technically how it could fill the function of most modern dynamic websites. The top visited websites at the moment include search engines, social networking apps, wikipedia, Amazon, etc., most of which are heavily database and search driven. Querying a database of even a few terabytes would seem impractical; certainly you wouldn’t store unique files for every state / modification of the database and push those out. Even index trees and the like would be impractical to spread out and synchronize. I’ve seen some mentions of orbit-db and pubsub and the like, but the explanations and descriptions seem a little handwavey and esoteric (possibly just due to my ignorance) and it doesn’t look like there are proven scalable examples using them. So I totally see the static content scenario with IPFS, but 90% of the web nowadays is about managing, searching and modifying huge data sets, and I’m still scratching my head a bit on that. When all you need is one record, do you need to transfer the whole database or a hefty search index?
Are there mathematical models of how the scaling works out? If BitTorrent is any indication, only the most popular torrents are reliable and have enough seeders, and there’s a vast array of more esoteric stuff having too few seeders to really get to. Will only the most viral static content benefit from the bandwidth-saving get-it-from-your neighbor effect? Will more esoteric stuff take ages to load and contribute to its continued unpopularity? I know the plans are to incentivize hosting with cryptocurrency, which sounds promising, but I’m curious how the sheer math of the scaling works, in an asymptotic O(N) sense, especially when every distinct version of a ‘file’ gets stored under its own immutable hash stamp.
On that note, is there a form of delta compression at all? If you were implementing a chat app, say, where each keystroke a group of people typed was appended to the end of the file and synchronized out to the swarm, does that create a distinct hash-name/file for every keystroke? Are those grouped at all on a block level or does the total data size scale at an O(N^2) rate? What about inserting bytes in the middle of a file?
One last thing, is there much safety really in preserving data permanently, especially unpopular stuff? If the originator drops off the network, and hasn’t paid someone to pin it etc., and that piece of content isn’t in demand at that moment, couldn’t it be lost forever just as with the current web?
Thanks in advance, and apologies if this seems unfair or critical. It’s stuff I’m genuinely curious about. Even in the static content scenario, the idea of an efficient grassroots CDN for sharing data that don’t require you to set up and pay for a web host etc. is pretty exciting.