I have a process that feeds relatively simple vehicle data into a kafka topic. The records are keyd by registration and the values contain things like latitude/longitude etc + a value called DateTime
which is a timestamp based on the sensor that took the readings (not the producer or the cluster).
My data arrives out of order in general and also especially if I keep on pumping the same test data set into the vehicle-update-log
topic over and over. My data set contains two records for the vehicle I'm testing with.
My expectation is that when I do a select on the table, that it will return one row with the most recent data based on the ROWTIME
of the records. (I've verified that the ROWTIME
is getting set correctly.)
What happens instead is that the result has both rows (for the same primary KEY) and the last value is the oldest ROWTIME
.
I'm confused; I thought ksql will keep the most recent update only. Must I now write additional logic on the client side to pick the latest of the data I get?
I created the table like this:
CREATE TABLE vehicle_updates
(
Latitude DOUBLE,
Longitude DOUBLE,
DateTime BIGINT,
Registration STRING PRIMARY KEY
)
WITH
(
KAFKA_TOPIC = 'vehicle-update-log',
VALUE_FORMAT = 'JSON_SR',
TIMESTAMP = 'DateTime'
);
Here is my query:
SELECT
registration,
ROWTIME,
TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS', 'Africa/Johannesburg') AS rowtime_formatted
FROM vehicle_updates
WHERE registration = 'BT66MVE'
EMIT CHANGES;
Results while no data is flowing:
+------------------------------+------------------------------+------------------------------+
|REGISTRATION |ROWTIME |ROWTIME_FORMATTED |
+------------------------------+------------------------------+------------------------------+
|BT66MVE |1631532052000 |2021-09-13 13:20:52.000 |
|BT66MVE |1631527147000 |2021-09-13 11:59:07.000 |
Here's the same query, but I'm pumping the data set into the topic again while the query is running. I'm surprised to be getting the older record as updates.
Results while feeding data:
+------------------------------+------------------------------+------------------------------+
|REGISTRATION |ROWTIME |ROWTIME_FORMATTED |
+------------------------------+------------------------------+------------------------------+
|BT66MVE |1631532052000 |2021-09-13 13:20:52.000 |
|BT66MVE |1631527147000 |2021-09-13 11:59:07.000 |
|BT66MVE |1631527147000 |2021-09-13 11:59:07.000 |
What gives?