0

New to Flink, I am building a simple aggregation pipeline, e.g. sales amount each day. I am using table api. I see that there are two options creating a table: temporary and permanent. For permanent table, we also need to setup a catalog, e.g. HIVE. So I am inclined to use temporary table, which is easy to get started. But curious what is good and bad about each other.

Based on the doc, the temporary table does not survive when the Flink job stops. Then what would happen if we make a Flink Job deployment for bug fixes.

Thanks!

Alfred
  • 1,709
  • 8
  • 23
  • 38

1 Answers1

2

A table does not store your data, but instead stores the metadata, i.e., the table’s name and location. E.g., in the case of a table backed by Kafka, the broker’s address and topic name.

It’s fine to use temporary tables. But if you want to share this metadata with other applications, then it’s convenient to store it in a catalog and use permanent tables.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • so just to clarify, the main reason to use permanent table is to share the metadata? Also, could you also help answer what would happen during a Flink application deployment (say for bug fixes). In particular, if there is table schema change, will Flink runtime recognize and continue the job from previous checkpoint? – Alfred Oct 04 '21 at 04:04
  • 1
    Yes, sharing the metadata is the main (and perhaps only) reason to use permanent tables. As for table schema changes, the Flink runtime will try to migrate the state, but may not succeed -- the Row type doesn't (yet) support schema evolution, and other state in your snapshots can also become incompatible if you make changes to your queries. – David Anderson Oct 04 '21 at 12:56