1

We have a readpanda (kafka compatible) source, with sensor data. Can we do the following:

  1. Every hour, find the average sensor data last hour for each sensor
  2. Store them back to a topic
NorwegianClassic
  • 935
  • 8
  • 26
  • Sure, ksql can do (tumbling, one hour) windowed averages. What have you tried? – OneCricketeer Jun 29 '22 at 18:59
  • But won't doing this create a continuous updated table/topic? I just want one single value for the average between two timestamps, for each sensor. – NorwegianClassic Jun 30 '22 at 13:12
  • It will, yes. If you create a table, though, you can query it externally, given a key, such as the starting hour. But also, you said you did want the results back into a topic, so what's wrong with a continuous stream? – OneCricketeer Jun 30 '22 at 13:52
  • It would make consuming these "average" messages a bit simpler. It would also fit well with our architecture for internal message queues. Materialize has this feature. But thanks, will think it through. – NorwegianClassic Jul 01 '22 at 07:08

1 Answers1

2

You want to create a materialized view over the stream of events that can be queried by other applications. Your source publishes the individual events to Kafka/Redpanda, another process observers the events and makes them available as queryable "tables" for other applications. Elaborating a few options:

KSQLdb is likely a default choice as it comes as "native" in the Kafka/Confluent stack. Be careful with using it over your production Kafka cluster. It has heavy impact on the cluster performance. See the basic tutorial or the advanced tutorial.

Use an out-of-the box solution for materialized views such as Materialize. It's easiest to setup/use and doesn't stress the Kafka broker. However, it is single-node only as of now (06/2022). See the tutorial.

Another popular option is using a stream processor and store hourly aggregates to an attached database (for example Flink storing data to Redis). This is a do-it-yourself approach. Have a look on Hazelcast. It is one process running both stream processing services and a queryable store.

Vlado Schreiner
  • 478
  • 2
  • 5