2

I am working in the kafka with KSQL. I would like to find out the last row within 5 min in different DEV_NAME(ROWKEY). Therefore, I have created the stream and aggregated table for further joining.

By below KSQL, I have created the table for finding out the last row within 5 min for different DEV_NAME

CREATE TABLE TESTING_TABLE  AS
SELECT  ROWKEY AS DEV_NAME, max(ROWTIME) as LAST_TIME 
    FROM TESTING_STREAM WINDOW TUMBLING (SIZE 5 MINUTES)
    GROUP BY ROWKEY;

Then, I would like to join together:

CREATE STREAM TESTING_S_2 AS 
  SELECT *
    FROM TESTING_S  S
        INNER JOIN TESTING_T T
        ON    S.ROWKEY = T.ROWKEY
    WHERE  
    S.ROWTIME = T.LAST_TIME;

However, it occured the error:

Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (org.apache.kafka.streams.kstream.TimeWindowedSerializer) is not compatible to the actual key type (key type: org.apache.kafka.connect.data.Struct). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.

It should be the WINDOW TUMBLING function changed my ROWKEY style

(e.g. DEV_NAME_11508 -> DEV_NAME_11508 : Window{start=157888092000 end=-}       

Therefore, without setting the Serdes, could I convert from the table to stream and set the PARTITION BY DEV_NAME?

Hong
  • 365
  • 4
  • 14

1 Answers1

0

As you've identified, the issue is that your table is a windowed table, meaning the key of the table is windowed, and you can not look up into a windowed table with a non-windowed key.

You're table, as it stands, will generate one unique row per-ROWKEY for each 5 minute window. Yet it seems like you don't care about anything but the most recent window. It may be that you don't need the windowing in the table, e.g.

CREATE TABLE TESTING_TABLE AS 
   SELECT 
     ROWKEY AS DEV_NAME, 
     max(ROWTIME) as LAST_TIME  
   FROM TESTING_STREAM 
   WHERE ROWTIME > (UNIX_TIMESTAMP() - 300000) 
   GROUP BY ROWKEY;

Will track the max timestamp per key, ignoring any timestamp that is over 5 minutes old. (Of course, this check is only done at the time the event is received, the row isn't removed after 5 minutes).

Also, this join:

CREATE STREAM TESTING_S_2 AS 
  SELECT *
    FROM TESTING_S  S
        INNER JOIN TESTING_T T
        ON    S.ROWKEY = T.ROWKEY
    WHERE  
    S.ROWTIME = T.LAST_TIME;

Almost certainly isn't doing what you think and wouldn't work in the way you want due to race conditions.

It's not clear what you're trying to achieve. Adding more information about your source data and required output may help people to provide you with a solution.

Andrew Coates
  • 1,775
  • 1
  • 10
  • 16