2

I am working on NIFI Data Flow where my usecase is fetch mysql table data and put into hdfs/local file system.

I have built a data flow pipeline where i used querydatabaseTable processor ------ ConvertRecord --- putFile processor.

My Table Schema ---> id,name,city,Created_date

I am able to receive files in destination even when i am inserting new records in table

But, but ....

When i am updating exsiting rows then processor is not fetching those records looks like it has some limitation.

My Question is ,How to handle this scenario? either by any other processor or need to update some property.

PLease someone help @Bryan Bende enter image description here

vipin chourasia
  • 211
  • 1
  • 3
  • 8

1 Answers1

4

QueryDatabaseTable Processor needs to be informed which columns it can use to identify new data.

A serial id or created timestamp is not sufficient.

From the documentation:

Maximum-value Columns:

A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.

Judging be the table scheme, there is no sql-way of telling whether data was updated.

There are many ways to solve this. In your case, the easiest thing to do might be to rename column created to modified and set to now() on updates or to work with a second timestamp column.

So for instance

| stamp_updated | timestamp | CURRENT_TIMESTAMP   | on update CURRENT_TIMESTAMP |

is the new column added. In the processor you use the stamp_updated column to identify new data processor properties

Don't forget to set Maximum-value Columns to those columns.

So what I am basically saying is:

If you cannot tell that it is a new record in sql yourself, nifi cannot either.

  • Hi Chris,Thanks for reply .I am using created date timestamp column as MAXIMUM- VALUE COLUMN and column list as columns to return .Setting this property ,I am able to get incremental update.Fine...But as u said ..... [the easiest thing to do might be to rename column created to modified and set to now() on updates or to work with a second timestamp column.].....Let see i have added one more timestamp column in table to handle updated row ,Now my concern is how to handle this in querydatabasetable processor .there is no property to set updated query.Could you pls provide with some example. – vipin chourasia Feb 19 '19 at 07:09
  • Thanks Cris ,able to receive updated record.Thank u so much.I let u know if i stuck another issue.:) – vipin chourasia Feb 20 '19 at 07:29