How to build "linked data"

I’m working on a fix for a grand scale problem and need some guidance if possible. I’ve been using ipfs with great success on other projects but ran into a snag that I would like some help with. I have a public IPFS node that works great at serving request for pinned data with low storage capibility. The hitch is a massive amount of data looking to be served via IPFS from both S3 and Backblase B2 public containers. IPFS can add with --no-copy but, I’m trying to understand if I can hash data and serve it from these two data destinations without copying/moving the data into a ipfs datastore. If you have steps to do this please send me a link or how-to on how to do this. I would love to get this data hashed and served via IPFS rather than S3 & B2 public links. Any guidance is greatly appreciated.

If we do not copy the data to IPFS node(at least temporarily buffer it, if not pin as permanent), then I believe we loose the whole advantage of IPFS.

For example, Imagine 10 IPFS nodes (spanned across the globe) which can serve this S3 content. Now, if each of them do not copy, then any request reached to them has to be directly sent to S3 - which is no better than directly querying S3. On the other hand, if the frequently accessed data is copied to the IPFS nodes (as the data passes through them), then the chances are, after some time, S3 can be relieved completely, since most of the data might be “hot” in the IPFS nodes ready to be served, without the requests having to hit the S3.

On the otherhand, if your interest is in having IPFS only as gateway (without caching the content), then what you are suggesting may make sense (since we do not want the gateway to copy everything that passes through it, and only acts as, well “gateway”).

Unfortunately, the current implementation of IPFS gateways also copy the content. I have a feature request (here: https://github.com/ipfs/interface-js-ipfs-core/issues/476 ), where the goal is to have the gateway respond with the “source peer id” and not return the data directly (so that client can connect to it and get the data from there, and not through gateway). That kind of mechanism should address your requirement too.

The torrent tracker servers work in this model. They respond with the source peer ID (rather than the content itself), so that clients can establish a direct P2P connection with the peer (and does not have to go through the gateway always).

That kind of model would greatly improve the IPFS reach (e.g. even torrents can be served through IPFS, by reusing the BitTorrent mainline DHT, which further improves the performance of the DHT resolutions, since mainline DHT is very rich).

This was an awesome answer. I’ve never thought of that point of view and helped out greatly. Thank you so much for your time and effort in explaining it to me.