1

I am a newbie to Flink and came across an article that mentioned

"A flink developer is responsible for moving event time forward by arranging the watermark in the stream".

So, I figured out the possible answer for this. As per my knowledge, if I Instruct the program to emit watermarks every 5 seconds. Actually, every 5 seconds, Flink invokes the getCurrentWatermark() method of AssignerWithPeriodicWatermarks. If the method returns a non-null value with a timestamp larger than the timestamp of the previous watermark, the new watermark is forwarded. This check is necessary to ensure event time continuously increases; otherwise, no watermark is produced.

So, once everything within a window has arrived it will trigger the operators and computations will be done accordingly and what is the role of the processfunctions? Watermarks can be used by processfunctions only, right?

whatsinthename
  • 1,828
  • 20
  • 59

1 Answers1

1

What you've said about periodic watermarks is correct. But normally I would recommend leaving the autowatermarking interval at its default value of 200 msec; setting it to 5 seconds will add 5 seconds of latency to your pipeline.

At the lowest levels of Flink's APIs, watermarks serve to trigger event-time timers, which are only exposed in process functions. Process functions are an essential building block for implementing event-driven applications. You are hooking right into the main event loop, processing each event as it becomes available. You also have access to fault-tolerant, low-latency, scalable state storage, and timers.

At higher levels of the DataStream API, watermarks are used to trigger event-time windows, and by CEP (to sort streams before doing pattern matching). Watermarks are also used in the Table/SQL API by windows, interval joins, temporal joins, and by MATCH_RECOGNIZE. In all of these cases, watermarks are used by these temporal operators to observe the progress of event time so they can emit results when those results are ready, and then to free state that is no longer useful.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Okay. So if I am using an `TumblingWindows` for 30 secs window and if I don't use any watermark, so is it okay? What would be the best practice ? – whatsinthename Dec 24 '21 at 10:46
  • A TumblingEventTimeWindow will only produce results if there are watermarks. – David Anderson Dec 24 '21 at 17:42
  • Cool. Thanks for taking time out to help me, David :) – whatsinthename Dec 24 '21 at 19:17
  • If I use windows of `30 secs` and `40 secs` as `watermark` so does it mean that computation for that particular window will take place after 40 secs? – whatsinthename Dec 24 '21 at 20:28
  • Is my last comment correct? – whatsinthename Dec 25 '21 at 08:31
  • Not exactly. Flink will wait until it processes an event with a timestamp at least 40 seconds past the end of the window. But if you are processing historic data, that might only require a few milliseconds of compute time -- or it might take hours -- there's a complete decoupling of event time and processing time. Plus, Flink will also have to wait for the autowatermarking interval to expire. – David Anderson Dec 25 '21 at 11:05
  • I didn't get you, David. Could you please elaborate in layman terms for me? That would be helpful for me. – whatsinthename Dec 25 '21 at 11:10
  • I also came across `TumblingWindows` examples where watermarking is not set. In that case is it using 2secs as watermark by default? – whatsinthename Dec 25 '21 at 11:12
  • 1
    There is no default watermarking. But some examples are relying on sources that do watermarking, while others are using processing time windows (which don't require watermarks). – David Anderson Dec 25 '21 at 11:30
  • 1
    Comments aren't a good place for lengthy explanations. Maybe continue by reading https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/streaming_analytics/ next, and then ask a new question if something isn't clear. – David Anderson Dec 25 '21 at 11:40
  • Thanks a lot, David for your time :) Now it is clear to me. – whatsinthename Dec 25 '21 at 13:02
  • Could you please help me with this? https://stackoverflow.com/questions/70488905/how-to-get-datastream-from-the-string-returned-by-the-method-in-flink-kafka-prob – whatsinthename Dec 26 '21 at 19:53