IPFS conceptual questions from a semi-noob

I’ve been intrigued by IPFS for the last few days after stumbling on a Hacker News article that mentioned it, and have watched a couple talks and took a deeper dive experimenting with it myself today.

Had some overall conceptual questions that I haven’t found clear answers to, so thought I’d ask some of the insiders here.

  • Is IPFS intended for anything beyond static content delivery? I’ve heard a lot of talk of it heralding “the decentralized web of the future”, “Web 3.0” and the like, but I can’t see technically how it could fill the function of most modern dynamic websites. The top visited websites at the moment include search engines, social networking apps, wikipedia, Amazon, etc., most of which are heavily database and search driven. Querying a database of even a few terabytes would seem impractical; certainly you wouldn’t store unique files for every state / modification of the database and push those out. Even index trees and the like would be impractical to spread out and synchronize. I’ve seen some mentions of orbit-db and pubsub and the like, but the explanations and descriptions seem a little handwavey and esoteric (possibly just due to my ignorance) and it doesn’t look like there are proven scalable examples using them. So I totally see the static content scenario with IPFS, but 90% of the web nowadays is about managing, searching and modifying huge data sets, and I’m still scratching my head a bit on that. When all you need is one record, do you need to transfer the whole database or a hefty search index?

  • Are there mathematical models of how the scaling works out? If BitTorrent is any indication, only the most popular torrents are reliable and have enough seeders, and there’s a vast array of more esoteric stuff having too few seeders to really get to. Will only the most viral static content benefit from the bandwidth-saving get-it-from-your neighbor effect? Will more esoteric stuff take ages to load and contribute to its continued unpopularity? I know the plans are to incentivize hosting with cryptocurrency, which sounds promising, but I’m curious how the sheer math of the scaling works, in an asymptotic O(N) sense, especially when every distinct version of a ‘file’ gets stored under its own immutable hash stamp.

  • On that note, is there a form of delta compression at all? If you were implementing a chat app, say, where each keystroke a group of people typed was appended to the end of the file and synchronized out to the swarm, does that create a distinct hash-name/file for every keystroke? Are those grouped at all on a block level or does the total data size scale at an O(N^2) rate? What about inserting bytes in the middle of a file?

  • One last thing, is there much safety really in preserving data permanently, especially unpopular stuff? If the originator drops off the network, and hasn’t paid someone to pin it etc., and that piece of content isn’t in demand at that moment, couldn’t it be lost forever just as with the current web?

Thanks in advance, and apologies if this seems unfair or critical. It’s stuff I’m genuinely curious about. Even in the static content scenario, the idea of an efficient grassroots CDN for sharing data that don’t require you to set up and pay for a web host etc. is pretty exciting.

1 Like

Your one post has so many various questions in it I’m not going to attempt to answer all of them, but I’ll try to answer some. It sounds like you’re looking for insider input, so I should probably note that I’m not an insider.

My impression is yes. See peerpad for a more simple example of using IPFS for collaborative document editing. A lot of your concerns seem more concerned with scaleability, which I’m not sure has been tested very much at this point. The implementations are still under heavy development and from what I can tell there are significant changes in progress that would affect IPFS’ scalability (e.g, bitswap sessions).

One of the problems with BitTorrent that is less of an issue for IPFS is that shared files in multiple torrents cannot be shared across different torrent swarms. If I go to download a torrent that has no seeds but there’s someone else seeding the same content in a different torrent, I cannot download the files from the other person. With IPFS I wouldn’t expect this to be as much of a problem since I can retrieve a file from anyone who has it, not just people with the file within a specific swarm.

If you’re talking about Filecoin, my understanding is that contracts are not indefinite. Unless someone is deliberately hosting something without payment I’d expect older versions of files to drop out of use over time.

Let’s say that one person is running an IPFS node with a rarely accessed file. Someone else decides they want that file and wait for it to download from the first peer; now there are two peers who can provide the file and subsequent requests for the file should be faster. I’m not sure how this would contribute to something not being popular.

Sure, but there’s also a chance that someone else could come along with that same file and add it in the same way. Then all of those broken IPFS links now work again and nobody had to make the content accessible at some specific URL.

3 Likes

Thanks for the replies. Curious to dig into peerpad a bit and try to understand how it works.

Re: bittorrent scaling, I imagine the content-addressing would help a bit, but you’d still have vast amounts of differing content (even torrents themselves differ in their intro readmes etc. so if they’re packaged as single IPFS “files” it seems you’d have trouble). Seems like absent some sort of delta compression or block-level deduplication this would create a content explosion and get out of hand quickly.

Re: popularity, what I was thinking is that if you want to watch e.g. two videos, and one is as-yet-obscure and takes 20 seconds to start (you might give up on it before that) and one is mega-popular and starts instantly, that will create a sort of positive feedback loop for the video that’s already popular.

Re: permanence, sounds like that part is no guarantee really, any more than the current web - although like you said it could come back at a later date if a peer comes back online (as with an obscure web server).

IPFS does do block-level deduplication. Blocks that are shared between directories or files are only stored once (per node). If you add a local directory to IPFS, add a file to the local directory, and then add the local directory again, only the new file will need to be added to IPFS.

Right, there’s no guarantee. The difference with http is that anyone can make content available again at its original address with IPFS. With http it has to be the original publisher (or someone who can gain control of the original domain if the original publisher goes out of business or something) and it has to be made available at the same exact address again in order for original links to work again. It seems like there are a decent number of conventional http sites that can’t even keep their URL schemes consistent over time so that links to their content don’t break even when they’re still maintaining their site.

2 Likes