0

I'm starting my journey of learning KSQLDB with an exciting exercise!

I have a Kafka topic that continuously receives log records from my machine. My ultimate goal is to deduplicate the events in the topic and provide an aggregated/windowed representation of the logs. For example, I receive messages like the below 10 times per minute:

May 23 19:08:12 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340
May 23 19:08:14 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780
May 23 19:08:20 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340
May 23 19:08:34 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780

my ultimate goal is to consolidate this window of 10 minutes into one single event that may look like this:

{ "first_timestamp": "May 23 19:08:12", "last_timestamp": "May 23 19:08:14", "message": "my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340", "occurences": 2 }
{ "first_timestamp": "May 23 19:08:14", "last_timestamp": "May 23 19:08:34", "message": "my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780", "occurences": 2 }

It may be a long shot to get it to the format I'm trying to reach, however, I would really appreciate any comments or thoughts on the process for achieving this.

Much thanks!

  • Sounds like [session windows](https://docs.ksqldb.io/en/latest/concepts/time-and-windows-in-ksqldb-queries/#session-window) is almost what you want. But capturing global counts or "last seen" values would be better done as a table. – OneCricketeer May 24 '23 at 15:22

0 Answers0