Why has no blockchain yet been established on IPFS?

Hi there,

if I do understand IPFS correctly, it should be possible to set up linked content on it, right?

I mean, where is my mistake for this “poor man’s blockchain”:

  1. you store a fixed amount of transactional data in a directory (e.g. 100 transactions) as block 0
  2. you pin the data up on ipfs
  3. you grab the hash of the directory, and post it somewhere for verification purposes

for every subsequent block:

  • you store the hash of the previous block,
  • then add your files to it
  • repin it
  • grab hash for next block

This gives you a reversely oriented blockchain-like structure, in my opinion, from which you can get from the current block back to the very first one. And that one needs to be the creator block (conforming to a published hash).

Has this been done before? If not, why not?

It would even solve the problem of not being able to store files on a blockchain and free each datablock from a fixed datastructure.

This is a serious question - did I stumble upon something here?

You can do that. Each block contains the hash of the previous, so technically, it’s a block-chain.
But it’s not enough to make it useful, and that is not enough to be called a Blockchain in the meaning we commonly use. What you describe is just a publicly appendable graph (which can be useful too, don’t get me wrong!).

Everybody can upload any version of the block 1 referring to block 0. There can be hundreds of concurrent blocks tagged “#1” and containing a data+the hash of #0. That’s fine and it enables traceability, but it lacks what blockchain was designed for: eventually, you should get only one version of the truth (eventual consistency). For that, you need rules to define what is “valid” and should be rejected, and you need a consensus mechanism to decide collectively what version among all the valid possible versions is the one we agree upon.

To take Bitcoin’s example, the rules are:

  • Every transaction should have an output equal to its input (value wise)
  • The input from an address cannot be more than the balance of this address (you can’t have a negative balance)
  • You have to prove that you know an address’ private key to send BTC from it
  • And that’s about it.

This is easy to check for anyone, and every node can instantly do the math to check that each transaction is valid (less and less true for Ethereum, but that’s another topic).

According to that, if I have a balance of 1 BTC, sending 1 BTC to you is possible, and sending 1 BTC to Nakamoto is possible. But the 2 transactions cannot coexist. So if I publish each transaction in different blocks, there are both “valid transactions” and “valid blocks”.
But they describe 2 different states of the Distributed Ledger, aka The State We Should All Agreed Upon, so we have to get rid of one.

To collectively chose one, we need a consensus mechanism.
In Bitcoin, for example:

  • Each block refer to exactly one block: the most recent
  • The commonly accepted chain is the longer one
  • To avoid flooding, the blocks are made hard to make (that’s the Proof of work), so that the time needed to generate one is way bigger than the time needed for everyone in the network to get the mined block and verify it.
  • If the block only contains valid transactions, contains the hash of the previous block and passed the challenge needed for the proof of work, it is now the new accepted version of the truth. All future mined blocks which don’t rely on this one will now be dismissed

So, for an open distributed file system to be called a blockchain, you need rules defining a valid input, and you need a consensus mechanism to resolve concurrent propositions of the truth (aka state).

NB: Blockchain is just one solution to ensure eventual consistency. Other strategies exist, so we regroup them under the broader term “Distributed Ledger”. For example, IOTA makes each transaction refer to 2 other transactions rather than grouping them and making them pointing to only one previous block.

NB2: There are many consensus mechanisms. Some can be more or less centralized (to the extremely stupid: Node Omega get to decide what is the truth). It depends on your use case.

NB3: If you want your blockchain to be truly decentralized, you limit what you can upload (volume or complexity for validation). Otherwise, small validators can be flooded, and only very “powerful” computers get to validate. On Bitcoin you only have to validate less than 10 Mb an hour: that’s nothing, and even the smallest app can do it.

NB4: That’s why most blockchain doesn’t allow massive data dumps onchain (or rather make it prohibitively expensive).

NB5: That’s why when you want to put big dumps “onchain”, you really put a link to it (usually with proof of authenticity such as a hash and/or a signature). The link can be an address or a Content ID.

NB6: You guessed it: There are already many projects using blockchain to publish/do stuff with data, but only store a hash to it onchain, and store the data itself… on IPFS!

Bonus: Actually, IPFS could also be used as a transport and storage layer of your blockchain. I guess the whole Bitcoin Blockchain have already been uploaded somewhere on IPFS… :wink:
So to make a blockchain on top of IPFS, just add 2 things: rules and a consensus mechanism.

3 Likes

Hello Akita,

first of all, I would like to thank you to have taken the time and effort to point out these things to me.

I am actually working in the field of blockchains, so I am aware of (some of) the issues that you raised. But you also raised some valid points (namely about trashing the blockchain) that I naively overlooked…

My approach is to check if we cannot leverage IPFS to create a faster and more flexible blockchain solution in that you get the hash chain for free. Also , you can “store and forget” blocks of arbitrary size on IPFS - which then again can become a problem due to the need to synchronize the different nodes in the network.

I am not saying that here doesn’t need to be additional software to be developed, I was just wondering about the storing part.

Say you find a solution to trashing the blockchain (e.g. you limit the size of a data block OR prevent the need to synchronize all data amongst all nodes) - do you know if there is a way to prevent anyone to overwrite/change existing blocks? Can one set rights to an existing directory on IPFS or can anyone write to any directory?

Hello _think,

I’m not in the field myself, I’m just an enthusiast :). So what I write is to be taken lightly.

That being said, I guess using IPFS as a storage layer is a no-brainer. A blockchain doesn’t care how you got your block (via another peer, on an USB stick, from Bittorrent, etc), as long as your interactions with it are valid and take into account the last state of the chain. With its dependencies such as libp2p, you get routing, peer-discovery, etc. for free.

Say you find a solution to trashing the blockchain (e.g. you limit the size of a data block OR prevent the need to synchronize all data amongst all nodes) - do you know if there is a way to prevent anyone to overwrite/change existing blocks?

Yes, there is. A consensus mechanism.
If you keep a copy available of (aka pin) the old block, it will be available, and nobody will be able to modify this chunk of data. BUT, they can provide an alternative block, which is basically the same for the rest of the network: the newer version can be considered a valid modification. Or an unwanted noise to be discarded.
So IF you want ONE version, you need a consensus mechanism. But again, as you know, there are many. Some are really simple.
But maybe you don’t need ONE version, and concurrent versions are ok? (like many concurrent Git branches)

Can one set rights to an existing directory on IPFS or can anyone write to any directory?

Conceptually, one can set right to an existing directory. Look up IPNS. It provides a hash that you can publish, and that will link to some resources. If you have access to the private key, you can update what it links to. So you can update the “directory”. Share the key with trusted peers, or add an authentification mechanism to manage this key, and you got a publically accessible and privately updatable directory.
By the way: “The state described in this directory is the truth” is a consensus mechanism! (Only not a very decentralized one…). You will probably need to refine this.

Basically, it will all depend on your use case. You got several tools in your toolbox:

  • Datastructure (Blockchain, IOTA-like datastructure, etc.)
  • Rules for valid input
  • Consensus Mechanism if you need it
  • …
    … that you will combine differently depending of what you want to do.

To go further, we need to know if you want…:

  • everybody to be able to see the data?
  • everybody to be able to append the data?
  • everybody to be able to verify the data (or having a few trusted peer is fine, 'cause that’s for an institution anyway)
  • rules on acceptable inputs
  • everybody to validate the whole chain or just “branches”
  • speed or consistency
  • heavy data to be part of the chain or stored offchain. (Onchain means these data are validated by your consensus and can be trusted to have enforced properties, offchain means you DLT is more decentralized and quick).
  • …

My approach is to check if we cannot leverage IPFS to create a faster and more flexible blockchain solution in that you get the hash chain for free.

What do you mean by “faster” in this context?
If you mean faster synchronization of peers in the network (peers getting the last state of the DLT faster), I guess IPFS will be a good solution for long-lasting nodes (which have found peers with good connection and uptime).
If you mean time for uploaded data (offchain) to be available for peers: it will be slow at first because peers have to find you to get it. Then very quick if it’s popular.
If you mean time for uploaded data (onchain) to be available for peers: it depends on your data structure. If any peer should touch all data anyway, it will have to be published first, but it will be popular soon.
If you mean time for uploaded data to be validated: it will depend on your consensus mechanism. there will be tradeoffs.
AFAIK, the hashchain won’t be for free: you will have to implement it. But maybe check IPDL. This part is a bit fuzzy for me, but it might be interesting.

So… what is your use case? :slight_smile:

(And just a reminder about the “store and forget” part, for new readers: data on IPFS are always available… until they are unpinned and not requested! So if you want to store the ledger, you can assume it will be used and pinned. Nice. If you want to store massive data, specific to a user, that is not likely to be stored in full by any peer, and only accessed from time to time by peers, you have to pin it (or make it pinned by someone) if you want it to be persistent. Like with the cloud, the data is not magically “nowhere”, and someone has to store it.)

Hello Akita,

I do not have a specific use-case in mind - seriously, this is my own idea and does not pertain to my job, although I am doing blockchain related work there as well.

To be honest with you, I was thinking about a fast blockchain that could be used for attestation purposes (think “notary services”), just as you publish hashes of data you want to prove later on on the bitcoin blockchain nowadays. But for bitcoin, it costs you money to store data there, plus, it hasn’t really been designed for this…

I was therefore thinking about a general-purpose hash-data saving blockchain for all kinds of things - a free blockchain you could post short snippets of data to it in order to prove their existence at a specific point in time. It is my understanding that this will be a very common scenario in the future if we do indeed get smart sensors and IoT devices all over the place - you will need additional bookkeeping about the correctness and itegrity of all the data these things will be producing, otherwise you can claim whatever you want if you can freely modify this data.

IPFS came to my mind because of its content-based transport - I didn’t fully thin kit through, but one of the things that annoys me with the current blockchains is the limitation of the data block sizes and the slow throughput (which, yes, yes, is usually the result of a consensus mechanism plus synchronization in order to get everyone on the same page).

My personal use-case would be an IPFS based public blockchain for general purpose use, with unrestricted block sizes (which, alas, is not possible for bigger file sizes, as you need to upload the data to the chain which may take time as well depending on the size of the data).

I guess my main focus was to decouple the block creation process from the storage problem so that blocks could be created much faster than to store/distribute their content.

So this is the part I am confused about. It was my understanding that if I store a blob of data on IPFS (pin it), I would have this fixed hash value under which it can be find. How can someone else provide another version of this very block, if the previous block points to its hash value?

I create a directory with files (=my blockchain payload) and store it on IPFS. I get a hash back. I take this hash, and add it to the next block …

I got it. THIS is the step - there are many people out there trying to so the same thing, which is why we need the consensus mechanism to agree who amongst us did indeed provide the next valid block. Ok, point taken.

Second use-case would be a blockchain with no arbitrary fixed limitation for its transactions/its payload.

But then again you could get the trashing problem (someone stalls the chain by uploading huge amounts of data), and this could be quite tricky as well, as you could be creating blocks ahead of time while not having uploaded the hash-forming payload of a previous block.

Speculative block building - is this a thing already? I know about competing chain ends in bitcoin though…

Hey,

For that, you can already publish on IPFS, and you have integrity for free since the content behind cannot be changed. The only thing storing the hash onchain gives you, is that it is roughly timestamped and that you don’t have to store and advertise the hash (because nodes do it for you, it’s now in the ledger). But you pay for that in most blockchains.
One way or the other, someone has to host the data. You either do it, pay someone to pin it, or put it onchain (paying tokens in almost all cases). I don’t see any pros for storing the data itself onchain, but I might be overlooking something.

NB: I know one DLT, IOTA, with no fee at all for the transaction, so you can store that there. (It might not be the only one.) The drawback is that your node has to participate if you want your transaction to be validated sooner and that it can be long to retrieve the hash since nodes only know a fraction of the ledger.

I was therefore thinking about a general-purpose hash-data saving blockchain for all kinds of things - a free blockchain you could post short snippets of data to it in order to prove their existence at a specific point in time.

You have to entice people into storing these hashes. Most (public) blockchains have cryptos and fees to finance the mining process helping the network. If you want a free DLT, you will have to find something else…

It is my understanding that this will be a very common scenario in the future if we do indeed get smart sensors and IoT devices all over the place - you will need additional bookkeeping about the correctness and itegrity of all the data these things will be producing, otherwise you can claim whatever you want if you can freely modify this data.

No storage system can help you with correctness. Shit in, shit out. You have to check correctness before upload (but that’s another story). About integrity, any content ID will do (IPFS included). And if you sign every record you have authenticity. And if you link them, you don’t have timestamp, but you have an ordering and traceability, which can be enough for many use cases. So you may not need blockchain at all.

IPFS came to my mind because of its content-based transport

Yes, deduplication and host-agnosticism are nice to have.

I guess my main focus was to decouple the block creation process from the storage problem so that blocks could be created much faster than to store/distribute their content.

Yes, I agree, metadata should be onchain if needed (timestamping,etc), data should be offchain (IPFS).

So this is the part I am confused about. It was my understanding that if I store a blob of data on IPFS (pin it), I would have this fixed hash value under which it can be find. How can someone else provide another version of this very block, if the previous block points to its hash value?

By “block” here, I refered to “blockchain block”, not to “IPFS blob”. These are immutable indeed.

I create a directory with files (=my blockchain payload) and store it on IPFS. I get a hash back. I take this hash, and add it to the next block …

If you publish the directory as an I P F S hash, it’s immutable. If you publish it as an I P N S hash, AND you somehow give access to the key, new versions will come. (But you still have the old one.)

I create a directory with files (=my blockchain payload) and store it on IPFS. I get a hash back. I take this hash, and add it to the next block …
I got it. THIS is the step - there are many people out there trying to so the same thing, which is why we need the consensus mechanism to agree who amongst us did indeed provide the next valid block.

Yep.

Second use-case would be a blockchain with no arbitrary fixed limitation for its transactions/its payload.

Yes, this is tricky indeed.
Basically, the strategies are:

  1. Let the market decide the size block. That’s what Ethereum do with the gas price, but that lead to centralization for the validators . And the blockchain is not free to use.
  2. Shard the Ledger so that nodes don’t have to keep the whole blockchain to validate everything (soon Ethereum, IOTA, probably others)
  3. Keep everything you can put offchain, offchain.
  4. Don’t use a blockchain. That’s the way to go every time you can in my opinion.

In my opinion, you don’t need blockchain for “storing unlimited amount online”. You just need someone to host it. So do it, pay a host, or do a blockchain only for this incentivizing part: go to Storj or wait for Filecoin.

Ok, so several things:

  • for the reproducible and checkable tracking of sensor data, I would indeed need no blockchain, PROVIDED that the data is kept as recorded on IPFS (e.g. by pinning it, paying someone else to store it or by being responsible to host it)

Generally speaking, storing big chunks of data off a blockchain makes it vulnerable to deletion or censorship. Don’t like what you stored? Just pull the plug, and noone can get to it anymore. I’d like to avoid that, even for bigger piles of data.

Whatever has been stored, stays there as is.

So, after further consideration, I am probably indeed most interested in a fast linked graph for quick attestation purposes. I want to grab data from a sensor as fast as I can, hash the value, and hash the hash of the value with the time in order to link this final hash for the entry with the next one in order to build a retraceable, immutable and verifiable chain of events.

Basically, I want to make sure that whatever was recorded is being preserved with the according time of acquisition.

In order to get that, I need a time server authority, an unidsputed source of the current time of recording.

How do I get a (preferably signed) timestamp off from IPFS for my content?

I guess you mean previous.
Yes, looks like a fine architecture to me, for the little I know. :slight_smile:

I’m not sure if there is a way to do that. Obviously, if ipfs does that, the timestamp has to come from the network, not from the node, since a modified node can change the timestamp before upload and sign that…

A guy already wondered if it could be done, but I don’t know the guts of the project to give a proper definitive answer to that. I had some thought, though:

To be honest, having a reliable time stamp from IPFS looks difficult, as it can be forged by the node or by a lot of nodes under its control. Intuitively, I would say that you will either have to rely on an external source, trust some peers (maybe some special-purpose peers), or trust the whole network (forcing it to agree with a consensus mech… Woups, sorry :P).
External source can be an old school timestamp service, etc.

Or maybe the fact that the record was after a certain date plus the ordering provided by chaining hashes is enough ? Depends on what you wanna do with the timestamp, i guess.