Interplanetary telemetry compression

I released an open source library for telemetry encoding and compression on IPFS.

It transforms an arbitrary array of json values into a columnar representation, compresses the columns, and stores the result on IPFS.

For telemetry data the compression can be very good, much better than just using gzip compression on the json array. Since there seem to be several people using IPFS for storing sensor data or other time series data, I thought this might be useful.

Here is a blog post describing how it works in detail.

5 Likes

Thanks for sharing. This was a great read (also thanks to the links to additional reading material like Rison).

This was a very cool blog post, thank you for sharing!

I didn’t see badger mentioned-- given the small objects, you will likely get drastically better performance in go-ipfs with it.

See https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#badger-datastore for more info. It is still experimental (YMMV), but i use it without hiccups yet.

@stebalien and @whyrusleeping may have more info on what to watch out for

1 Like

We will definitely check it out. We have lots of small blocks in the applications we are developing for Actyx.

By the way: we have been bitten by the fact that you can create a block > 4MB, but not bitswap it. https://github.com/ipfs/go-ipfs/issues/4473#issuecomment-350396836 Led to a production issue this week. This should be better documented, or preferably it should not be possible at all to create a block that can not be sent over the network.

One thing I dislike about compressing data before storing as dag objects is that the content is then an opaque blob and no longer a meaningful IPLD object. So ideally IPFS should transparently compress data using a fast compression algorithm such as zstd before storing, and optionally send it over the wire in the same compressed form. If you hash before compressing this can be completely transparent.

You could create another IPLD Format which extracts the object (kind of like another view on the blob). I’m on the JS side of IPLD, so I don’t know how easy that would be in Go.

I don’t really know how that would work. So far I have always treated IPLD dag objecs as just JSON with ipfs links.

But why would anybody not want their dag objects to be stored and transferred compressed? What’s the downside of transparent compression?

This should be an internal feature of ipfs, like the Content-Encoding: gzip mechanism in HTTP.

Agreed about using Badger. I’ve been using BadgerDS on my IPFS nodes for the app I’m developing without any problems so far.

It should be possible for your to modify the maximum block size https://github.com/libp2p/go-libp2p-net/blob/master/interface.go#L20 although I’m not sure how that would effect compatability with the rest of the network

Could you perhaps describe more about this production issue you ran into? I would be curious to know more about the details since I suspect I will run into this issue in the next month.

I read somewhere that badger has high memory usage. Since we are running on low-power and low-memory arm edge devices this might be a problem. However, I will roll it out on a few cloud nodes and a few developer devices and see how it goes. Thankfully we got infrastructure to painlessly roll out new ipfs versions.

I think it is best to stick with the defaults. We are using a pretty niche (for now) technology, so at least we want to stick to the settings other people are using. We have just adjusted our chunking algorithm to never exceed 4 megabytes.

It was exceedingly simple. We generate data on multiple devices. Due to some strange circumstances and an application level bug, one of the devices created an IPFS dag node that was >4mb, and we were not able to get that hash on the other devices.

So the whole system got stuck and it took me a while to figure out what was going on.

Awesome thank you for the detailed response! Ah yea in that situation it would definitely make sense to stick with defaults. I may try some testing at some point on a private network if I have time with blocks larger than 4MB.

Thanks agian.

IPLD dag objects arebinary blobs with a JSON-like representation. So it can be JSON, but could also be anything else.

Agreed.

Yes, I know that it is converted into CBOR before hashing, so the canonical representation that is used for hashing is CBOR. But as a mere user of the IPFS API, I don’t really have to care. Which is nice. And it should be exactly the same for transparent compression…

Cool. So what would it take to make it happen?

First thing would be opening an issue at https://github.com/ipfs/go-ipfs.