1

I have n sensors generating measurements every t minutes to their own topic as follows:

Topic_1: {timestamp: 1, measurement: 1}, {timestamp: 2, measurement: 4}, ...

Topic_2: {timestamp: 1, measurement: 5}, {timestamp: 2, measurement: 3}, ...
 
Topic_n: {timestamp: 1, measurement: 3}, {timestamp: 2, measurement: 5}, ...

This number of sensors is dynamic but for sake of simplicity let's assume I have 3 sensors, therefore, 3 topics getting data every t minutes.

What is the best topology for joining all measurements with the same timestamp as shown below?

{timestamp: 1, measurement: 1} 
{timestamp: 1, measurement: 5}  --------> {timestamp: 1, measurements: [1,5,3]}
{timestamp: 1, measurement: 3}
utxeee
  • 953
  • 1
  • 12
  • 24

1 Answers1

2

You have a few options. You can use join and define a joiner to make the list. However it would have to be a windowed stream after the join. If your measurements always come in during the grace period then this should not be a problem.

EDIT: (if the number of topics can vary it would not work with the join, instead you would nee to use a pattern subscription then aggregate)

A little more complicated, if your time stamps do not have duplicates you can groupByKey then aggregate into the lists. this will form a table with the results you want. If you need it to be a stream you can use toStream and filter out updates without a list of length n.

There are probably a few other ways of doing this as well, but these come to mind first.

wcarlson
  • 216
  • 1
  • 9
  • Given that the number of input topics may vary, I don't think that joining would work. Instead, you would need to subscribe to all topics in a single stream using pattern subscription, and window-aggregate the data accordingly. – Matthias J. Sax Jan 19 '21 at 01:23
  • That is a good @MatthiasJ.Sax. I didn't think of that, ill updated the answer – wcarlson Jan 19 '21 at 16:36