6

I am using Kafka streams 2.2.1.

I am using suppress to hold back events until a window closes. I am using event time semantics. However, the triggered messages are only triggered once a new message is available on the stream.

The following code is extracted to sample the problem:

        KStream<UUID, String>[] branches = is
            .branch((key, msg) -> "a".equalsIgnoreCase(msg.split(",")[1]),
                    (key, msg) -> "b".equalsIgnoreCase(msg.split(",")[1]),
                    (key, value) -> true);

    KStream<UUID, String> sideA = branches[0];
    KStream<UUID, String> sideB = branches[1];

    KStream<Windowed<UUID>, String> sideASuppressed =
            sideA.groupByKey(
                    Grouped.with(new MyUUIDSerde(),
                    Serdes.String()))
            .windowedBy(TimeWindows.of(Duration.ofMinutes(31)).grace(Duration.ofMinutes(32)))
            .reduce((v1, v2) -> {
                return v1;
            })
            .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
            .toStream();

Messages are only streamed from 'sideASuppressed' when a new message gets to 'sideA' stream (messages arriving to 'sideB' will not cause the suppress to emit any messages out even if the window closure time has passed a long time ago). Although, in production the problem is likely not to occur much due to high volume, there are enough cases when it is essential not to wait for a new message that gets into 'sideA' stream.

Thanks in advance.

erankl
  • 316
  • 1
  • 6
  • That is expected behavior -- if no data arrives, event-time does not change and thus a window cannot be closed. – Matthias J. Sax Oct 28 '19 at 02:53
  • The problem here is a little different. New events still get to the system, so, the system's event-time does change. However, messages being suppressed, are suppressed on a specific stream. As long as messages do not get to this specific stream, the window is not closed and the suppressed messages remain suppressed. Producing messages to this specific stream, in order to force the window to close is possible, but, required to be implemented for each suppress in the code and damages, significantly, the readability of the code – erankl Oct 28 '19 at 08:12
  • Understood -- again, it's by design. Check out the original design document for more details: https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables – Matthias J. Sax Oct 28 '19 at 17:07
  • A possible solution is listed here: https://stackoverflow.com/a/60824254/458370 – Ant Kutschera Mar 25 '20 at 06:59

1 Answers1

4

According to Kafka streams documentation:

Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available. If at least one partition does not have any new data available, stream-time will not be advanced and thus punctuate() will not be triggered if PunctuationType.STREAM_TIME was specified. This behavior is independent of the configured timestamp extractor, i.e., using WallclockTimestampExtractor does not enable wall-clock triggering of punctuate().

I am not sure why this is the case, but, it explains why suppressed messages are only being emitted when messages are available in the queue it uses.

If anyone has an answer regarding why the implementation is such, I will be happy to learn. This behavior causes my implementation to emit messages just to get my the suppressed message to emit in time and causes the code to be much less readable.

erankl
  • 316
  • 1
  • 6
  • Your quote seems to be for "punctuations"... There are many reasons why suppress() is implemented that way -- overall, it's complicated. Check out the original design document: https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables – Matthias J. Sax Oct 28 '19 at 17:07
  • Thanks. I understand. The thing is, that it is easier to implement suppression using a transformer/processor than to implement suppression using the actual suppress operation, as it requires feeding each stream that is being suppressed with "control" messages and then ignore them (as they require no logic) to make sure stream time progresses – erankl Oct 29 '19 at 12:41
  • Well -- if you implement it manually, you get different semantics -- seems you want "stream time" to progress even if not input data is there, what does not make sense given the definition of "stream time". You might be interested in this KIP though: https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time – Matthias J. Sax Oct 29 '19 at 17:10