0

The requirement is that I want to write an Akka streaming application that listens to continuous events from Kafka, then sessionizes the event data in a time frame, based on some id value embedded inside each event.

For example, let's say that my time frame window is two minutes, and in the first two minutes I get the four events below:

Input:

{"message-domain":"1234","id":1,"aaa":"bbb"}
{"message-domain":"1234","id":2,"aaa":"bbb"}
{"message-domain":"5678","id":4,"aaa":"bbb"}
{"message-domain":"1234","id":3,"aaa":"bbb"}

Then in the output, after grouping/sessionizing these events, I will have only two events based on their message-domain value.

Output:

{"message-domain":"1234",messsages:[{"id":1,"aaa":"bbb"},{"id":2,"aaa":"bbb"},{"id":4,"aaa":"bbb"}]}
{"message-domain":"5678",messsages:[{"id":3,"aaa":"bbb"}]}

And I want this to happen in real time. Any suggestions on how to achieve this?

Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
dks551
  • 1,113
  • 1
  • 15
  • 39

1 Answers1

0

To group the events within a time window you can use Flow.groupedWithin:

val maxCount : Int = Int.MaxValue

val timeWindow = FiniteDuration(2L, TimeUnit.MINUTES)

val timeWindowFlow : Flow[String, Seq[String]] =
  Flow[String] groupedWithin (maxCount, timeWindow)
Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125
  • How do you group the events based on a particular id here ? so events having same message-domain value will end up as a single event after grouping. – dks551 Mar 12 '18 at 16:22
  • 1
    @dks551 That part of your question is much more involved. It requires significant string manipulation to get the format that you are looking for. Could it instead return a dict with key `1234` and a value of `Seq[String]` where the values in the sequence are the strings? Or, convert each string into a case class? – Ramón J Romero y Vigil Mar 12 '18 at 16:30