Best IPFS practice for event or time series data

I have a small Particle network with sensors periodically reporting data with time stamps, temperature, humidity etc. I want to store that data… IPFS seems like a perfect distributed way to do so. And I want to view the last say 48 hours on a web page.

I have a few go-ipfs and ipfs-cluster processes running, so I kind of understand that part.

I “think” the answer has these ingredients:

  1. ipfs gateway (for posting update)
  2. IPNS pubsub to be aware of newly posted data
  3. some kind of ipfs-cluster awareness of updates to ensure distributed storage.

What is the recommended way do go about this?

It is difficult to answer this without knowing more about your setup.

  • Are you running IPFS nodes at the places where the data is gathered?
  • What quantities of data are we talking about?

With the ingredients you should definitely be able to roll your own solution. However, you might consider using something like https://github.com/orbitdb/orbit-db if the amount of data is not that high.

FYI I have added this to ipfs-cluster:

If you end up coding something along those lines, let us know!

1 Like

The data is gathered using a sensor connected to an arduino-ish device e.g. Particle. Via the Particle cloud service, the sensor data is published via HTTP to a data aggregation/collection golang service. The data rate is approx 1K per minute per sensor. I currently only have about 5 sensors… designing for 10^3

Data aggregation/collection options I see:

  1. Use a go-ipfs gateway… where does pubsub fit here?
  2. add IPFS client libs to my golang service… how do I know if/when data is stored?

User interface use cases:

  1. Latest measurement per sensor: Use IPFS JS client to subscribe via pubsub to new data. Whats the high end for number of topics? Is there and advised message/topic ratio? e.g. given 10^3 sensors, create topic per sensor?
  2. Spark line per sensor: Use IPFS JS client to fetch a “window” of data… not sure how to fetch a window. Encode time in file name?

Thank you in advance for any/all time spent thinking about this. If you happen to be in Portland, Oregon I’d be happy to buy you a beer/coffee or two for your time.

1 Like
  • So 1K samples per minute per sensor? What are the latency requirements? When do you need to see new data in your web ui?

  • So the “data aggregation/collection golang service” is not really distributed, but a central cluster that can possibly run ipfs?

Regarding number of topics: I would try a single one and see how far that gets you. 1000 messages per second are not exactly much in a data center. And if you don’t have enough network bandwidth, splitting the load over multiple topics is not going to help.

Old topic but hey, this might be interesting for people looking into similar needs:
http://blog.klaehn.org/

1 Like

( For discoverability:

… blog post was first seen in this sibling thread :

)