0

My requirement is to hold 30 days data into stream to given any day for processing. so first day when FLINK application will start, it will fetch 30 days data from database and will merge to current stream data. My challenge is - manage 30 days data window. If I create slidingwindow for 30 days with sliding time 1 day. something like

WatermarkStrategy<EventResponse> wmStrategy = WatermarkStrategy.<EventResponse>forBoundedOutOfOrderness(Duration.ofMillis(1))
                .withTimestampAssigner((eventResponse, l) -> eventResponse.getLocalDateTime().toEpochSecond(ZoneOffset.MAX));

        ds.assignTimestampsAndWatermarks(wmStrategy)
                .windowAll(SlidingEventTimeWindows.of(Time.days(30), Time.days(1)))
        .process(new ProcessAllWindowFunction<EventResponse, Object, TimeWindow>() {

            @Override
            public void process(Context context, Iterable<EventResponse> iterable, Collector<Object> collector) throws Exception {
    --- proccessing logic
}

in this case process() do not start processing immediately when first element of historical data is added. my assumption is ```a) by default first event will be part of first window and will be available for processing immediately. b) next day job will remove last 29th day data from window. is my assumption correct with that piece of code? thank you for your help on this.

Ashutosh
  • 33
  • 8

1 Answers1

1

I don't think that Your assumptions are correct in this case. When You are using the TimeWindow with ProcessFunction it means that the function is able to process the data when the window is closed (in Your case after 30 days). In this case, slide in time window means that the second window will contain 29 days of the first window and 31st day which was not part of the first window.

Dominik Wosiński
  • 3,769
  • 1
  • 8
  • 22
  • 1
    The answer is mostly correct. I'm just adding that you could use an offset of Time.days(-29), to also force the closing after the first day. – Arvid Heise Jul 30 '20 at 11:37
  • Thank you Dominik and Arvid. Both suggestions are good. so obviously, it is not good idea to wait for a whole day to close the window and then process. If I want to start processing events immediately with first window then what I need to do? now I am not sure how to maintain 30days window to maintain historical data :( . appreciate your help on this. thanks. – Ashutosh Jul 30 '20 at 12:34
  • You should take a look at triggers that would allow You to emit partial window results :) – Dominik Wosiński Jul 30 '20 at 13:38
  • @ArvidHeise, I tried Time.days(-29) but it gives error as abs(offSet) should be < slidingTime that is 1 in my case. – Ashutosh Jul 31 '20 at 06:43
  • @DominikWosiński, I tried to implement CutomeTrigger but not getting expected result. For example if onEvent() I FIRE the trigger then it comes 30 times for processing and it also comes in next window. so first time size 1 (30 times), next time size (2 - that include first element 30 times) ... something I am missing :( – Ashutosh Jul 31 '20 at 06:46
  • If you want to do some preliminary calculations before the window is closed, have a look at reduce and aggregate (more general). If you calculation is somewhat associative (like sum, count), you can also first have a tumbling window of 1 day + 1 day slide and then have a second window operation over the aggregates. – Arvid Heise Jul 31 '20 at 07:33