1

I have a specific use case in which I am consuming data from a single topic. That topic receives messages that contain a specific type.

My service has a mapping between those types and a time window (for example type X is 1 hour, type Y is 2 hours, etc). What I would like to know is - is it possible to create a single stream which will be able to consume a message, get its type and then create a dynamically sized window aggregation based on the mapping I have.

For example, assuming my topic contains 3 messages, and my service has the following mapping:

Tpye X - 1 hour, Type Y - 2 hours

And these are the messages currently in my Kafka input topic:

1. Type X
2. Type Y
3. Type X

When the 1st message arrives, I want a window aggregation to be created for type X which would last for 1 hour. Once the hour is over, I want some sort of business logic to take place. When the 2nd message arrives, it is of type Y, so a different window is created for 2 hours, also once finished I want some specific code to run.

I know that I can achieve that by separating the messages to designated topics (one per message type), but the types are dynamic and I want to avoid creating/destroying topics.

I've also looked as Session windows, yet the inactivity gap is still static and it does not solve my use case.

Emil Gelman
  • 125
  • 1
  • 10

1 Answers1

1

It seems that what you're looking for is not such much windowing per se, but some sort of timer instead. Kafka Streams doesn't offer timers our of the box. One way around it however is to schedule a Punctuator using the Processors API: https://docs.confluent.io/current/streams/developer-guide/processor-api.html#defining-a-stream-processor

But it would still require a fixed parameter for the time -- i.e. not dynamic. If that is strictly nescessary, it can also be done by registring a timer using Apache Flink for example.

.

Fixed-time punctuation function

Here is a possibility when using a fixed-time Punctuator for Kafka Streams:

  1. Separate, i.e. branch out streams for the different types

  2. Implement a custom Processor class:

Custom Processor implematation https://gist.github.com/dvcanton/45818abf4903b54f9fb0028025b6729a

Dave Canton
  • 178
  • 4
  • 1
    I doubt that Flink can do more than Kafka Streams for this case... Even if using Timers might be a little simpler -- however, you can schedule as many punctuation as you want (ie, one per type if this case), and you can cancel them at any point (ie, if you cancel a punctuation after it fired, it works like a timer) – Matthias J. Sax May 23 '20 at 22:03