This is the input Kafka topic which contains ConnectionEvent
:
ConnectionEvent("John", "123", "CONNECTED")
ConnectionEvent("John", "123", "DISCONNECTED")
ConnectionEvent("Anna", "222", "CONNECTED")
ConnectionEvent("Rohan", "334", "CONNECTED")
ConnectionEvent("Anna", "199", "CONNECTED")
ConnectionEvent("Anna", "255", "CONNECTED")
ConnectionEvent("Anna", "255", "DISCONNECTED")
ConnectionEvent("Anna", "222", "DISCONNECTED")
Streaming & Reduction logic
Each item in the topic is sent using message key as user id. For eg, "Anna".
Stream has to be processed in following way:
- John has only 1 session 123 which connected and disconnected. So he is offline
- Rohan has only 1 session 334 which is not disconnected. So he is online
- Anna has 3 sessions (222, 199, 255) of which 2 are disconnected. So she is online
KTable must have the following data:
John Offline
Rohan Online
Anna Online
What I tried is this:
KTable<String, String> connectedSessions = stream.groupBy((k,v) -> v.getSessionId()) //Group by user and then by sessionId
.reduce((agg, newVal) -> agg) //Take latest value ie, reduce pair of records for each session to 1
.filter(x -> x.getState == CONNECTED) //Filter only session records which has CONNECTED has last state
But now, how will I ungroup the composite key (user, sessionId) to only user and then mark user as online/offline based on number of sessionIds with latest state as CONNECTED?