2

As is highly recommended by the documentation, I want to add uids to my operators in Flink for the purpose of savepointing. My job uses the Table API. I have not found in the documentation how to add uids to operators with a SQL query.

My code looks something like this:

StreamExecutionEnvironment env = ...;
StreamTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
Table table = tEnv.sqlQuery("SELECT * FROM mytable GROUP BY TUMBLE(col1, INTERVAL '10' SECOND));
tEnv.writeToSink(table, someSink, qConfig);

If my understanding is correct, the TUMBLE Window is an internal operator state. Therefore, I want to assign it a specific uid to prevent some of the issues that can arise from the autogenerated id. What is the correct way to do this?

I am running Flink v1.6.2

Stevenyc091
  • 195
  • 1
  • 2
  • 22

1 Answers1

4

The Table API does not allow you to set an uid for operators. The problem is that SQL queries might result into different execution plans if one compiles it with a different version. Therefore, it won't help to have the uids set if your plan changes completely. At the moment, it is effectively not possible to provide backwards compatibility for SQL queries.

Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • 1
    is there still an auto-generated id? Use case: I have a query with a 1hr TUMBLE. I cancel the job with a Savepoint after 30mins. Assuming I have not changed the query nor changed Flink versions (i.e. the query plan is the same), 5mins later I start the job again passing the Savepoint path. Will the state loaded from the Savepoint include the 30mins of the TUMBLE already passed? Or is state from SQL queries excluded from the Savepoint entirely as part of Flink's internals? – Stevenyc091 Apr 03 '19 at 03:51