I'm starting my journey of learning KSQLDB with an exciting exercise!
I have a Kafka topic that continuously receives log records from my machine. My ultimate goal is to deduplicate the events in the topic and provide an aggregated/windowed representation of the logs. For example, I receive messages like the below 10 times per minute:
May 23 19:08:12 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340
May 23 19:08:14 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780
May 23 19:08:20 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340
May 23 19:08:34 my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780
my ultimate goal is to consolidate this window of 10 minutes into one single event that may look like this:
{ "first_timestamp": "May 23 19:08:12", "last_timestamp": "May 23 19:08:14", "message": "my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 12340", "occurences": 2 }
{ "first_timestamp": "May 23 19:08:14", "last_timestamp": "May 23 19:08:34", "message": "my-host sshd[1234]: Invalid user alpha from 127.0.0.1 port 56780", "occurences": 2 }
It may be a long shot to get it to the format I'm trying to reach, however, I would really appreciate any comments or thoughts on the process for achieving this.
Much thanks!