0

tl;dr is there a way to emit the first element of a WindowedStream which matches some condition, and to stop processing all events in that window for a given key after the condition has been matched?


I am performing a reduce operation on a WindowedStream. I am trying to see if any element of the stream matches a given condition. If an element does match the condition, that element should be the output of the stream. After this element has been found, it is no longer necessary to process events for that window.

Here is a simplified example. (Written in Kotlin, but I think it should be understandable to any Java or Scala developers who know Flink. Please feel free to post an answer in any Flink supported language.)

import org.apache.flink.api.common.eventtime.WatermarkStrategy
import org.apache.flink.streaming.api.datastream.DataStream
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time


data class Event(val eventId: Int, val timestampMillis: Long, val failed: Boolean)


fun eventAggregator(stream: DataStream<Event>): DataStream<Event> {
  // Use the event's timestamp for watermarks.
  val watermarks = WatermarkStrategy.forMonotonousTimestamps<Event>()
                                    .withTimestampAssigner { event, _ -> event.timestampMillis }
  
  // Key the stream by eventId
  val keyedStream = stream.assignTimestampsAndWatermarks(watermarks).keyBy(Event::eventId)

  // One-minute window
  val windowed = keyedStream.window(TumblingEventTimeWindows.of(Time.minutes(1)))
  
  // If any event with a given eventId failed, we want to mark that ID as failed.
  return windowed.reduce { a, b -> if (a.failed) a else b }
}

I tried implementing a custom trigger which would delegate to EventTimeTrigger unless a failure was found, then it would FIRE. But this would trigger the entire window, leading to incorrect results for most other events.

import org.apache.flink.streaming.api.windowing.triggers.EventTimeTrigger
import org.apache.flink.streaming.api.windowing.triggers.Trigger
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult
import org.apache.flink.streaming.api.windowing.windows.TimeWindow

class FireOnFirstFail() : Trigger<Event, TimeWindow>() {

  private val delegate = EventTimeTrigger.create()

  override fun onElement(element: Event, timestamp: Long, window: TimeWindow, ctx: TriggerContext) =
    if (element.failed) {
      TriggerResult.FIRE // Very buggy
    } else {
      delegate.onElement(element, timestamp, window, ctx)
    }
  
  override fun onProcessingTime(time: Long, window: TimeWindow, ctx: TriggerContext) =
    delegate.onProcessingTime(time, window, ctx)

  override fun onEventTime(time: Long, window: TimeWindow, ctx: TriggerContext) =
    delegate.onEventTime(time, window, ctx)

  override fun clear(window: TimeWindow, ctx: TriggerContext) =
    delegate.clear(window, ctx)
}

...

  return windowed.trigger(FireOnFirstFail()).reduce...

I am running this on Kinesis Data Analytics, so it is using Flink 1.15.2, which is the latest available version at the time of writing.

Jake
  • 321
  • 3
  • 12
  • You could do this with a custom Trigger. – David Anderson Jul 24 '23 at 01:15
  • @DavidAnderson thanks, any tips for implementing such a trigger? Just edited the question to include my naive attempt at implementing one. – Jake Jul 24 '23 at 12:22
  • Could you clarify what you mean by "But this would trigger the entire window, leading to incorrect results for most other events." Also, don't you want to fire & purge, not just fire? Seems like you want to clear the window, and thus drop all subsequent events after seeing a failed event. – kkrugler Jul 24 '23 at 16:41

0 Answers0