0

I'm creating a pipeline which ingests unbounded data source and does an aggregation computation. The computation is done in 10 minutes window based on event time and 5 minutes buffer for late-arriving events. I want to have the result of aggregation is emitted only once after that 10 minutes window and 5 minutes buffer passed.

I don't know how to make the window only emit the result once. I believe the correct way is using AfterWatermark trigger but If I'm using withLateFirings() the result will be emitted twice after the window passed and after late firing duration passed. If late firing is not used, the late events will not be included in the computation, this doesn't fulfill my requirements.

public class WindowFactory {
  private static final Duration FIVE_MINUTES = Duration.standardMinutes(5);

  public static Window<Message> getMessageFixedWindow(Duration duration) {
    return Window.<Message>into(FixedWindows.of(duration))
                 .triggering(
                      AfterWatermark
                        .pastEndOfWindow()
                        .withLateFirings(
                             AfterProcessingTime
                                .pastFirstElementInPane()
                                .plusDelayOf(FIVE_MINUTES)))
                 .discardingFiredPanes()
                 .withAllowedLateness(FIVE_MINUTES);
  }
}

Please suggest me the good way to only produce 1 result after 10 minute windows and 5 minutes buffer.

Anton
  • 2,431
  • 10
  • 20

2 Answers2

1

What you have setup right now will trigger twice, once when the watermark has passed the end of the window, and once when the late data buffer window closes.

There is no way to disable the first firing at the end of the window with just Triggers. However, you can detect that you are seeing the first firing and ignore it. By inspecting Pane.IsLast().

@ProcessElement
public void processElement(ProcessContext c) {
  if (!c.pane().isLast()) { 
    return;
  }
}

You cannot make the system fire at the end of the window, for cases where there is no late data. The system doesn't know if late data will arrive at this point. Though, I don't think you were specifically asking about this, I just wanted to mention that.

Alex Amato
  • 1,685
  • 10
  • 15
1

Try the solution from this post:

 // We first specify to never emit any panes
 .triggering(Never.ever())

 // We then specify to fire always when closing the window. This will emit a
 // single final pane at the end of allowedLateness
 .withAllowedLateness(FIVE_MINUTES, Window.ClosingBehavior.FIRE_ALWAYS)
 .discardingFiredPanes())

As described in the code comments, you first use the Never.ever() trigger so that the window will will never fire, and hence will not fire when the watermark passes the end of the window. Using the closing behaviour Window.ClosingBehavior.FIRE_ALWAYS which will override the trigger, ensures that a pane is always fired when the window closes, after the allowed lateness.

This will result in 1 pane being fired after the 10 minute window + 5 minute lateness buffer.

Joe Stoker
  • 157
  • 11