0

This is the input Kafka topic which contains ConnectionEvent:

ConnectionEvent("John", "123", "CONNECTED")
ConnectionEvent("John", "123", "DISCONNECTED")
ConnectionEvent("Anna", "222", "CONNECTED")
ConnectionEvent("Rohan", "334", "CONNECTED")
ConnectionEvent("Anna", "199", "CONNECTED")
ConnectionEvent("Anna", "255", "CONNECTED")
ConnectionEvent("Anna", "255", "DISCONNECTED")
ConnectionEvent("Anna", "222", "DISCONNECTED")

Streaming & Reduction logic

Each item in the topic is sent using message key as user id. For eg, "Anna".
Stream has to be processed in following way:

  • John has only 1 session 123 which connected and disconnected. So he is offline
  • Rohan has only 1 session 334 which is not disconnected. So he is online
  • Anna has 3 sessions (222, 199, 255) of which 2 are disconnected. So she is online

KTable must have the following data:

John Offline
Rohan Online
Anna Online

What I tried is this:

KTable<String, String> connectedSessions = stream.groupBy((k,v) -> v.getSessionId()) //Group by user and then by sessionId
            .reduce((agg, newVal) -> agg)  //Take latest value ie, reduce pair of records for each session to 1
            .filter(x -> x.getState == CONNECTED)  //Filter only session records which has CONNECTED has last state

But now, how will I ungroup the composite key (user, sessionId) to only user and then mark user as online/offline based on number of sessionIds with latest state as CONNECTED?

cppcoder
  • 22,227
  • 6
  • 56
  • 81

1 Answers1

1

AFAIU a user is online, as long as the number of his CONNECTED events is larger than DISCONNECTED. So you can aggregate the number of connections in your stream and check if it is positive. Something like:

        KTable<String, String> connectedSessions = stream.groupByKey()
        .aggregate(
            () -> 0,
            (k, v, numberOfConnections) -> v.getState == CONNECTED ? numberOfConnections++ : numberOfConnections--)
        .mapValues((k, numberOfConnections) -> numberOfConnections > 0 ? "Online" : "Offline");
Peyman
  • 250
  • 3
  • 8
  • What happens when connected record is gone due to retention policy of Kafka topic? Then this logic will break – cppcoder Jun 18 '20 at 17:15
  • So I think reliable logic would to to check atleast one session has connected as last state to be online – cppcoder Jun 18 '20 at 17:17
  • Every aggregation materializes the result in a internal topic which has „compaction“ as cleanup policy. Which means Kafka guarantees that you don’t lose the updates. – Peyman Jun 18 '20 at 17:43
  • What about this case. Current state is online. Now comes multiple disconnects and one connect and he should be still online since disconnects was for previous sessions. But this logic would make him offline? – cppcoder Jun 18 '20 at 18:05
  • If you already missed the beginning of the stream and those events are now deleted, it is not easy to define the status of connection. For example: John is CONNECTED in a session but this event is no longer available and no more DISCONNECTED comes for this session. You can’t show the correct state if don’t see the entire stream from the beginning. – Peyman Jun 18 '20 at 18:21
  • It’s enough though to see the events once and aggregate them into the new aggregation internal topic. Afterwards they can deleted whatever the cleanup policy is . – Peyman Jun 18 '20 at 18:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216237/discussion-between-cppcoder-and-peyman). – cppcoder Jun 18 '20 at 18:37