Blocking copyrighted material / adult content on an HTTP gateway

I created a website called https://ipfs.video to showcase the ability to stream large video files directly from IPFS using js-ipfs.

Now I would like to allow people to index webm files, so I started development on https://index.ipfs.video to create an indexing service for webm files on IPFS.

On both my ipfs.video website and my index website, I would like to block illegal/copyrighted and adult content. Some gateways are already doing that, but I was wondering if there was already a service to openly share the blocked cids? Because if not, I think I should probably start building one first :wink:

Cool project! I didn’t know you could stream files like that. I did it with IPLD and pre-chunked videos but your method is way simpler.

As for filtering, that seams very hard to do. Content addressing means changing one bit result in a new CID.

You would need some kind of AI with a CRDT that can be updated by everyone.

Create a positives sample grope (classifier model) by adding stuff from adult websites into your site, then when a new upload matches the Adult content positive video recognition don’t link the video. (If you don’t want any adult content don’t link videos used in the classifier, either.) The only bottle neck is for copyrighted content, copyright holders usually don’t publish a classifier for their content.

There are some Copy Protection systems that try to add an id, usually called a “fingerprint”, to the optional text block in an mp4. This id is then used to determiner the origin of the mp4. You could block mp4s using this id system, but that wouldn’t control for DVD-rips and other valid copyrights that don’t have an id system.

Blocking CIDs isn’t going to work that well…

Any given file will have a different CID if a single bit is changed in the file. Simply passing a file through a re-encoding process… ffmpeg for audio/video or mat2 for images, pdf, and other files, will change its CID as well as the metadata and any other top level identifying marks.

In order to identify copyrighted material and/or adult materials, an algorithmic frame or image analysis is required.