We are using KSQL to perform some aggregations / filtering on a real time data. One of the use case we have is, we need to perform some operation on last N days of a particular activity, this would be continuous operation. So this needs to be hopping window.
When I tried the query, with hopping duration as M days, KSQL query returned M records instead of 1 (which was hoping for).
Query :
select PROTO,
TIMESTAMPTOSTRING(WindowStart(), 'yyyy-MM-dd''T''HH:mm:ss''Z''', 'UTC') as "timestamp",
TIMESTAMPTOSTRING(WindowEnd(), 'yyyy-MM-dd''T''HH:mm:ss''Z''', 'UTC'),
COUNT(PROTO) AS Count
FROM DATASTREAM
WINDOW HOPPING (SIZE 5 DAYS, ADVANCE BY 1 DAY)
WHERE MSG like '%SOMESTRING%'
AND SPLIT(PROTO, '/')[0] = 'tcp'
GROUP BY PROTO;
tcp/22 | 2020-01-27T00:00:00Z | 2020-02-01T00:00:00Z | 1
tcp/22 | 2020-01-28T00:00:00Z | 2020-02-02T00:00:00Z | 1
tcp/22 | 2020-01-29T00:00:00Z | 2020-02-03T00:00:00Z | 1
tcp/22 | 2020-01-30T00:00:00Z | 2020-02-04T00:00:00Z | 1
tcp/22 | 2020-01-31T00:00:00Z | 2020-02-05T00:00:00Z | 1
Is there any way to get only first record only, or the records for which end time <= current time or any other workaround to get 1 result per window ?
Please consider below data records.
{ "time": "2020-01-25 23:36:37 UTC", "msg": "Error"}
{ "time": "2020-01-25 23:36:38 UTC", "msg": "Error"}
{ "time": "2020-01-25 23:36:40 UTC", "msg": "Error"}
{ "time": "2020-01-26 23:36:37 UTC", "msg": "Error"}
{ "time": "2020-01-26 23:36:38 UTC", "msg": "Error"}
{ "time": "2020-01-26 23:36:39 UTC", "msg": "Error"}
{ "time": "2020-01-26 23:36:40 UTC", "msg": "Error"}
{ "time": "2020-01-27 23:36:37 UTC", "msg": "Error"}
{ "time": "2020-01-27 23:36:38 UTC", "msg": "Error"}
{ "time": "2020-01-27 23:36:39 UTC", "msg": "Error"}
{ "time": "2020-01-28 23:36:37 UTC", "msg": "Error"}
{ "time": "2020-01-28 23:36:38 UTC", "msg": "Error"}
{ "time": "2020-01-29 23:36:37 UTC", "msg": "Error"}
{ "time": "2020-01-29 23:36:38 UTC", "msg": "Error"}
{ "time": "2020-01-29 23:36:39 UTC", "msg": "Error"}
{ "time": "2020-01-29 23:36:40 UTC", "msg": "Error"}
I am looking for count of records which have, msg as Error past 2 days.
If I fire the KSQL query on 25th at 23:36:37, I would be expecting result as :
2020-01-25T23:36:37Z | 1
2020-01-25T23:36:38Z | 2
2020-01-25T23:36:40Z | 3
2020-01-26T23:36:37Z | 4
2020-01-26T23:36:38Z | 5
2020-01-26T23:36:39Z | 6
2020-01-26T23:36:40Z | 7
2020-01-27T23:36:37Z | 5
2020-01-27T23:36:38Z | 6
2020-01-27T23:36:39Z | 7
2020-01-28T23:36:37Z | 4
2020-01-28T23:36:38Z | 5
2020-01-29T23:36:37Z | 3
2020-01-29T23:36:38Z | 4
2020-01-29T23:36:39Z | 5
2020-01-29T23:36:40Z | 6