1

Currently have a flow using QueryDatabaseTable which reads from a DB and puts the data into HDFS. Decided to use QueryDatabaseTable because:

  • of the state kept for using it for delta loads
  • also the fine tuning when tables are in the 100s of million records.

My question is that I now have 100 tables that require the same flow (DB => HDFS). I do not want to create the same flow 100 times. I have looked into ListDatabaseTables which would be perfect, but it seems QueryDatabaseTable doesn't take any input.

Has anyone encountered something similar?

bp2010
  • 2,342
  • 17
  • 34

1 Answers1

2

QueryDatabaseTable is meant to do incremental loading of a table and therefore has to maintain state about the table so it can now what to retrieve on next execution. As a result, it can't allow dynamic tables because then there is an infinite amount of state that needs to be kept.

ListDatabaseTables is meant to be used more with GenerateTableFetch and ExecuteSQL to do bulk loading of a DB table.

Bryan Bende
  • 18,320
  • 1
  • 28
  • 39
  • Thanks for the answer. but GenerateTableFetch is also keeping state right? Using the same `Maximum-value Columns` similar to in QueryDatabaseTable – bp2010 Sep 16 '19 at 13:39
  • Yes GenerateTableFetch keeps state as well, you'll just want to include a flow file attribute in each one that specifies the max-value column(s) for that table, and use that attribute in an Expression Language expression for the Max-Value Columns property. – mattyb Sep 16 '19 at 13:51
  • This answer says "QueryDatabaseTable can't allow dynamic tables because then there is an infinite amount of state that needs to be kept." .. so what is the difference to GenerateTableFetch then as it also keeps this same state? – bp2010 Sep 16 '19 at 14:00