2

I am using a test setup including confluent platform (docker) and am processing records with the following information: Sensor-ID, timestamp, value. Using robinhood's faust (similar to Kafka Streams but in python) I am trying to do the following:

Whenever there is a new record for a sensor there should be a "timer" and if no new record for this sensor-ID is received within the given time there should be an error indicating possible failure for that sensor/machine.

I have tried using time.sleep() but what happens is that it will just sleep for 10 sec and then process the next record.

Is it even possible to do something like this with the setup I am using?

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
LukasM
  • 43
  • 8
  • Since I wasn't able to solve the problem in the described way I did a workaround using a dictionary with sensor-IDs as keys and the most current timestamps as values. This dictionary gets checked for timestamps which are older than 10 secs whenever ANY record is received which usually is more than once a second. This is probably not the best solution but as of now it is the only one I could come up with – LukasM Feb 28 '19 at 20:25
  • I had a similar problem, and I eventually realised that since this whole thing is an event driven system it requires a new event to trigger some processing. I did try using the windowing in Faust where I had 1 second hopping windows and then adding in an on_window_close method so that when a new event comes in, or when an old window closes it calculates (once per second) what the 'count' should be, but the issue I found was it was memory only so on a rebalance or crash the window history was gone and I ended up with negative counts over time. I decided to manually calculate using a side stream – Fonty May 21 '22 at 22:45

1 Answers1

0

You can use KSQL's window tumbling:

Create a stream of sensor information;

CREATE STREAM sensorinformation \
  (sensorid VARCHAR, \
   sensortimestamp BIGINT, \
   value VARCHAR) \
 WITH (KAFKA_TOPIC='sensorinformationtopic', \
       VALUE_FORMAT='DELIMITED', \
       KEY='sensorid', \
       TIMESTAMP='sensortimestamp');

And finally create a table that contains faulty sensors that appear only once within the time window of 10 seconds:

CREATE TABLE faulty_sensors AS \
  SELECT sensorid, \
         count(*) \
  FROM sensorinformation \
  WINDOW TUMBLING (SIZE 10 SECONDS) \
  GROUP BY sensorid \
  HAVING count(*) = 1;
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • I think you misunderstood the problem here. A failure in this case would be if there is no new record for 10 secs or more. As far as I understand your answer a failure would be if there are more than one record per 10 sec window. If I am wrong please let me know. – LukasM Feb 28 '19 at 20:22