This is a question about regulatory compliance. Without the ability to delete content from IPFS, how does one maintain compliance with GDPR, CPRA and similar regulations that implement a “right to be forgotten” and similar individual rights? Has there been a formal analysis of this question in relation to IPFS? We are in the early days of creating web3 applications and I think this would be an important topic for developers and web3 entrepreneurs.
You implement that just like you would do for web2.
When someone ask to remove the data from your servers, you just remove it, and if an other copy exists somewhere else on an other server you don’t control, it’s not your buisness, that the other server owner one.
IPFS does not force anything, you are allowed to remove files from your nodes if you want, what you can’t do is force other nodes to do the same, each node owner does whatever they want, pin (or not) the files they want.
@Jorropo Thanks for your thoughts on this! I was under the incorrect impression that it was not possible to delete files that you add to IPFS. I now understand that it is possible to delete from IPFS, and I understand that it is not possible to delete a file that someone else has retrieved and pinned.
My broader question is about many of the other aspects of GDPR and CPRA. Asking for permission before sharing, knowing and being able to report on who has been a sharing recipient, and many others. It would be great if there was a deeper analysis of GDPR and IPFS in the broader scope. Perhaps that is not available now.
That sentence is probably more correct than it is incorrect.
Let me do the parallel with HTTP.
You cannot delete files from HTTP. Because if an other server you don’t control host the data, you cannot delete it from that server. However you can delete data from your own server, you control it it’s yours.
Well for IPFS things are exactly the same:
You cannot delete files from IPFS. Because if an other server you don’t control host the data, you cannot delete it from that server. However you can delete data from your own server, you control it it’s yours.
The only difference is that with HTTP people will need to somewhat manually find an other server hosting the data, and wont be certain the data is actually the same, maybe the other server modified it before sending it to them.
IPFS use content addressing, so as long as the CIDs (which are just hashes) are the same, IPFS clients automagically takes care of downloading and checking the data.
There’s a good article here on encrypted personal data and GDPR. Whilst it doesn’t give any definitive answers, it’s a good place to start:
One thing I would recommend is to engage with the ICO and ask them questions. They have an SME advisory team, who I’ve spoken to and am working with (on something unrelated) and they are genuinely keen to help when it comes to innovative technologies and where GDPR does / doesn’t apply:
What about the peer ID? Does it count as personal data under GDPR?
The Peer ID is created when the IPFS node is initialized and is essentially a cryptographic hash of the node’s public key.
On a more broader stance I am wondering how does GDPR apply at all when it comes to collecting data when every user is a host and not the company itself?
I surely understand that the developer is responsible for the legal matter of activity on the app but when it comes to data everybody owns a replica of everything whereas GDPR was created based on the premise that the company owns the data.
IPFS leaks a fair amount of information:
- Your peer ID is persistent, by default.
- IPFS discovers and advertise your addresses. You’d want to run your IPFS node inside a container (or, even better, a VM) that only knows about the VPN interface.
- The content stored by a node can be used to fingerprint it.
- DHT records stored by a node can definitely be used to fingerprint it.
Does that mean that by just using IPFS itself to host your app you are already liable for this information even if you don’t use it in your app?
While a long string of letters and numbers may not be a “Johnny Appleseed” level of human-readable specificity, your PeerID is still a long-lived, unique identifier for your node. Keep in mind that it’s possible to do a DHT lookup on your PeerID and, particularly if your node is regularly running from the same location (like your home), find your IP address.
I don’t understand the question.
PeerID is just a random cryptographic thing you control, even assuming that a useless cryptographic key is personal data (which I don’t know if it is), who is gonna send you the GDPR request ? Yourself ? What happen if you don’t comply ? You are gonna sue yourself ?
I belive you are severly missunderstanding the issue.
Here we were talking about that if a service host some data of clients on IPFS, if that service receives a GDPR notice, the OP were scared that they couldn’t comply with such GDPR request because they couldn’t remove files from IPFS.
What I’m saying is that IPFS allows you to remove files from your own nodes, however you can’t force other servers to do the same, just like HTTP.
All of the documentation you see is to explain what an adversarial actor could do. IPFS doesn’t record peerids long term by default, because logging useless random data waste space on your disks (it just record PeerIDs in the DHT and peerstore which is pruned after 2 days at worst).
Thanks @Jorropo. That cleared up quite a bit of things to me.
who is gonna send you the GDPR request ? Yourself ? What happen if you don’t comply ? You are gonna sue yourself ?
I belive you are severly missunderstanding the issue.
So basically GDPR is completely non-applicable in the case of self-hosted dapps as far as I understand?
I don’t have the answer to this question.
My point is that the only peer id you record is your own.
And I am confident that you are not gonna sue yourself over not deleting your own peer id.
IPFS doesn’t record peerids by default because it’s useless.
If you enable logging of all DHT requests or bitswap ones, and save them to disk, your questions might make sense (as bitswap or dht will log the peer id of people doing requests),
however then complying to this seems easy, someone send you a GDPR notice for peerid XYZ, (you could maybe ask them for a signature from that peer id to prove it’s them if you care about that ?)
You just run
grep -v XYZ logs > logs.with.this.peerid.filtered and remove
(That assumes peer id would fall under the GDPR, which I don’t if it does.)
In my case every peer has a full copy of a distributed orbit database which holds all peer ids. So I guess it is still relevant. Just to be on the safe side I will replace them with random numbers.
Actually now that I think about it, if you save your IPFS logs, if a node is doing bad stuff (like sending you files you didn’t asked, … IPFS is also gonna output their CID in the logs, however by default IPFS doesn’t save logs anywhere, it just output them to
But still if you want to remove a peer id, just
grep -v it out of your logs.
With that context.
Ok your question makes sense.
That I just don’t know.
if a node is doing bad stuff (like sending you files you didn’t asked
Isn’t that impossible by default on the network or you are describing an attacker hacking the system?
In my case everyone who decides to use the app hosts it. The app doesn’t involve storing(besides itself) or sending of files.
That could be a bug, or someone that is trying to make you host files for free or someone that is trying to DOS you by sending useless data.
That a thing that happen from time to time, AFAIK it does nothing except waste bandwidth.
The public key of a natural person is a unique identifier and its use in online services is generally associated with other types of information that make it possible to identify and profile the person holding such a key. Under these conditions, the public key is personal data that uniquely identifies a person and thus its processing is subject to the provisions of the GDPR, although it can be considered as a method of pseudonymisation insofar as it can conceal a person’s real name.
That answers the question as far as peer id is concerned.
I arrived at the conclusion that If IPFS has all peer ids public and this is not an issue then it’s not an issue for dapps on the network because they were revealed before that.
This is a very interesting discussion. Of course, GDPR is the topic here, but there are other data privacy regulations that might intersect with this discussion. California CPRA comes to mind, and there are others. Just a few additional thoughts:
In many respects IPFS might indeed be like a telecoms company. Telcos are not currently responsible for content that travels over their networks in the US. Protocol Labs might make this argument if faced with a GDPR compliance action. But IPFS also provides a web interface to the underlying distributed database. Would this interface fall under some GDPR requirements? I don’t think this part of the question is fully answered.
Any developer of an application that uses IPFS (and related technologies) would almost certainly fall under GDPR requirements, I think. The ability of a data subject to request deletion (already discussed), pre-approve data sharing, request cessation of data sharing, get a list of entities who received a copy of the shared data, and so forth, would be in effect I think.
Article 34 of GDPR, and related recitals, has already been mentioned. Node IDs and CIDs seem like they are pseudonymized, but IP addresses are certainly private information covered under many regulations.
So I think developers need to be aware of this issue and be prepared. Arguing that IPFS can’t be controlled won’t hold water. The regulator can just say “Don’t use IPFS”. That would not be a great outcome.
I am also not convinced yet that Protocol Labs can argue that GDPR does not apply to them. It may be the case, but I just don’t think that is a fully resolved issue. See the comment about the web interface, and IPFS Desktop is probably relevant.
I am not claiming to be an expert in GDPR compliance. I will leave that to others who are more qualified than I am. I have, however, fully read GDPR, CPRA, HIPAA, PCI, NY-DFS and other compliance regulations. And I have built applications that meet these compliance regulations. I am of the opinion that it would behoove Protocol Labs and the related IPFS development community to pay attention to this issue and provide some guidance to developers. I love this technology (have a crypto background) and would hate to see IPFS encounter regulatory issues that are plaguing cryptocurrencies now.
Just my 2 cents.
I can’t comment on the full aspect of regulations but when it comes to public peer ids it seems to be pretty simple. They are already public as well as geo location in the IPFS desktop interface. If they are not compliant with any regulation they have to be masked/hidden at IPFS level and not at developer level where they have already been exposed before the dapp even existed. The case with peer ids is vastly different to that of user data flowing through telcos since it’s a default provider functionality, not user or developer generated content. Other than that I agree that when it comes to the dapp itself and user generated content in the app it’s the developer’s responsibility.