How secret are CIDs and what should I think about hosting user uploaded media on IPFS?

One of my projects is a platform for organizing and hosting community events (which I’m cleaning up for an open source release), two live examples:

These sites end up with a lot of multimedia uploaded like user photos, session photos, recordings/media for sessions. Currently this presents challenges when people want to fork instance to experiment or duplicate content into new years. In my mind this application aligns really well philosophically with IPFS, because a big part of how I set these sites up is making sure content is preserved indefinitely so the connections and materials such communities build doesn’t rot away after everyone goes home.

Currently, no media uploaded to the system is considered secret or secure, if you obtain the (currently integer) ID of an upload it’s public to share and download. I’m considering moving this function to IPFS, so that each host when be an IPFS node, and the database would just have CIDs for media and not worry about where they’re stored. If I did that, I imagine a cool next move would be making a bigger deal in the UI of having presenters and attendees use the site to achieve media/materials from sessions and maybe educate in-UI a bit why having all that uploaded to IPFS is useful (e.g. have a link to a pin list per session?)

What sort of questions should I be thinking through in an application like this? So far I’m wondering:

  • Would it be good or bad form to serve image assets directly from https://ipfs.io/ipfs/... or should I set up my own HTTP endpoint that servers content from a node where the content is likely to already be pinned? (there can be up to thousands of headshots served on a page)
  • Can CID’s be considered a secret? I expect that one thing that might put organizers on edge with IPFS being use for this is if content would actively propagate on its own or be easily enumerable for bulk scraping outside what CIDs the UI chooses to render. Could the set up be such that content generally doesn’t ever leave our node until we hand out a CID?
  • Anything else I should think about or read up on?

I’m just a newbie, but I think you are getting a bit of a wrong picture.

IPFS doesn’t store indefinitely files per se. They are only stored by users that requested it, i.e. if the files are constantly popular they are available. Constant storage is called pinning and services that offer that are called pinning services, which are mostly a for profit thing.

CIDs are secret in sense that AFAIK you can’t get a list of CIDs from a peer. You have to request if the peer has it, i.e. “do you have CID x, y? – yes/no.”.

ipfs.io and cloudflare-ipfs.com are IPFS gateways. You could use js-IPFS so that each browser works as an IPFS node.

IPFS doesn’t store indefinitely files per se. They are only stored by users that requested it, i.e. if the files are constantly popular they are available. Constant storage is called pinning and services that offer that are called pinning services, which are mostly a for profit thing.

Right, I’m picturing that my application server would run as an IPFS node that pins all its user content in place of storing it directly on disk as it does now

You could use js-IPFS so that each browser works as an IPFS node.

I’m not sure I want to do that, I still want pages to load as quickly as possible. cloudflare-ipfs looks like a great option, though I’m wondering if I wanted to be less reliant on free private infra, would it be better to run my own gateway out of the same server that’s providing HTTP for the site and runnings its IPFS node that has everything pinned? Or is there semantic value in sticking to the standard ipfs.io gateway URLs?

I guess your method is good.

You could also detect if the user is IPFS aware.

One method is the x-ipfs-path HTTP header.

You can try this: http://plesko.si/test/header.php as an example.
With a IPFS aware browser it would just redirect via IPFS, otherwise redirect via HTTP.
The code is fairly simple:
> <?php
> header(‘x-ipfs-path: /ipfs/QmX8DwrfM8D9e3PbBoKjqinjR2jtMT97ewYecu7r9D7jnP’);
> header(‘Location: cat.jpg’);
> ?>

An another way would be to load a JS containing a variable via IPFS and check the variable.
Example: http://plesko.si/test/ipfs-aware.html

You could also use a combination of both: http://plesko.si/test/ipfs-aware2.html
Where it calls a javascript file via the first method, where it outputs the variable as false if it doesn’t recognize the x-ipfs-path.
> <?php
> header(‘x-ipfs-path: /ipfs/QmWWDaXBcCVi16SSTjivkUXZsNU2AZa2Jiajr7kEGZFNsE’);
> header(‘Location: http.js’);
> ?>

Well maybe there are better ways. No idea. :slight_smile:

1 Like

The CIDs can be watched by the peers that participate in routing. you can take a look at this article to learn more
https://niverel.tymyrddin.space/en/research/dawnbreaker/ipfs/security

1 Like

This only exposes CIDs that have been published in one way or another already outside of their origin node though right? This isn’t a path to someone enumerating the CIDs that exist on a node?

I’m starting from a premise that users understand that all content they upload is public and could be scrapped, so really what I’m trying to determine is if switching to IPFS for storage and transmission make it any easier for someone to enumerate/scrape content than crawling a site with directory listings disabled to download everything linked. So far, sounds like the answer is no?

As far as I understand, when a node adds any file to the IPFS, it will announce it to the network by sending the generated CID to the node that is responsible for storing the link between the CID and the origin node. In this process, the announcement travels through several nodes and each of these nodes is able to catch the published CID and download it (for example that is how the ipfs-search.com works).
One way is to encrypt the content before publishing it to the IPFS so that no one can decrypt it except who knows the key.

2 Likes