0

EDITED:

I have a requirement to skip records that are created before 10s and 20s after if a gap in incoming data occurs.

(A gap is said to occur when the event-time1 - event-time2 > 3 seconds)

the resulting data is used to calculate average or median in a timewindow,

Is this possible to be done with Kinesis analytics, Dataflow, flink API, or some solution that works?

Ajmal M Sali
  • 598
  • 6
  • 14

1 Answers1

0

If I understand correctly, you want to find the median and average of records that are created between 10 and 20 seconds after a gap of at least 3 seconds.

Using Flink (or Kinesis Analytics, which is a managed Flink service), you could do that with session windows, or with a ProcessFunction. Process functions are more flexible, and are capable of handling pretty much anything you might need. However, in this case, session windows are probably simpler, especially if you are willing to wait until a session ends (i.e., until the next gap) to get the results. You could avoid this delay by implementing a custom window Trigger.

window tutorial
process function tutorial

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • David, i have edited the question, the question is mainly regarding skipping certain records used in calculation. – Ajmal M Sali Jul 19 '20 at 13:35
  • I don't understand the timing requirements, but as for skipping events, with either of the solutions I mentioned you will process the events one-at-a-time and can skip any that you don't want to affect the results. – David Anderson Jul 19 '20 at 13:39