0

I have a Flink application that processes data from 2 streams. I am using a Table API where i want to consume data from one stream1 and query another stream2 and get the record with the latest timestamp -

I have this now -

    def insert_into_output(output_table_name, event_table_name, code_table_name):
        return """
        INSERT INTO {0} (ip, sn, code, timestamp)
        SELECT DISTINCT
        ip, sn, code, timestamp
        FROM {2} WHERE
        sn =
        (SELECT 
        sn
        FROM {1}
        WHERE timestamp = 
        (SELECT MAX(timestamp) FROM {1}))
        """.format(output_table_name, event_table_name, code_table_name)

Unfortunately, i am getting an error stating - doesn't support consuming update and delete changes which is produced by node GroupAggregate(groupBy=[ip, sn, code, timestamp], select=[ip, sn, code, timestamp]). Any ideas?

Dan
  • 79
  • 10

1 Answers1

0

The result of your SQL query with MAX(TIMESTAMP) means that the result can continuously change, since there could be a higher timestamp now than there was 5 minutes ago. The result of this SQL statement is therefore a retract stream. You can read more about this on the Table to Stream conversion documentation

You're emitting to Kinesis, but that doesn't support retract streams, only append streams

Martijn Visser
  • 1,468
  • 1
  • 3
  • 9