Databricks- Autoincrement identity- How to insert the changes from latest cdf version

Question

I have an autoloader table processing a mount point with CSV files. After each run, I would like to insert some of the records into another table where I have an AutoIncrement Identity column set up.

I can rerun the entire insert and this works, but I am trying to only insert the newest records.

I have CDF enabled, so I should be able to determine the latest version, or maintain the versions processed. But it seems like I am missing some built in feature of Databricks.

Any suggestions or sample to look at?

score 0 · Answer 1 · answered Jul 08 '22 at 11:46

Note - Delta change data feed is available in Databricks Runtime 8.4 and above.

You can read the change events in batch queries using SQL and DataFrame APIs (that is, df.read), and in streaming queries using DataFrame APIs (that is, df.readStream).

Enable CDF

%sql
ALTER TABLE silverTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)

Any suggestions or sample to look at?

You can find Sample Notebook here

Databricks- Autoincrement identity- How to insert the changes from latest cdf version

1 Answers1