0

I would like to select last element of processing window using FlinkSQL. Tried to achieve that with ROW_NUMBER in Blink planner. Tried following query:

SELECT * FROM (
  SELECT key, value, ROW_NUMBER() OVER w AS rn
    FROM InputTable
  WINDOW w AS (PARTITION BY key, TUMBLE(rt, INTERVAL '15' MINUTE) ORDER BY -ts)
) WHERE rn = 1

// rt = ts.rowtime, ts is Long

Unfortunately, this causes exception

org.apache.flink.table.planner.codegen.CodeGenException: Unsupported call: TUMBLE(TIMESTAMP(3) *ROWTIME*, INTERVAL SECOND(3) NOT NULL) 
If you think this function should be supported, you can create an issue and start a discussion for it.

Any idea what am I doing wrong? I was thinking about TUMBLE function as something "equivalent" to calculating rowtime % interval.

bottaio
  • 4,963
  • 3
  • 19
  • 43

1 Answers1

0

TUMBLE (as well as HOP and SESSION) are special built-in functions in Flink SQL (version 2.11) that can only be used in a GROUP BY clause. In principle you are right and it should be OK to use TUMBLE in this context, but it is simply not supported at this point.

You could implement a user-defined function to re-implement the grouping logic of TUMBLE however, I would not recommend that because the query would not perform well. Flink SQL would not be aware that a partition (PARTITION BY key, TUMBLE(rt, INTERVAL '15' MINUTE) would only be "active" for 35 minutes and keep its state forever. Hence, the query would accumulate more and more state over time which slows down checkpointing and recovery. IMO, such time-based OVER partitions should be supported in the future, but AFAIK they are not yet.

Fabian Hueske
  • 18,707
  • 2
  • 44
  • 49