I am having a databricks delta table created on data lake storage which holds data as shown below.
db_name table_name location table_format table_type load_ts
--------------------------------------------------------------------------------------------------------
abc table1 dbfs:/mnt/abc/data/table1 delta EXTERNAL 2022-09-14T18:48:02.859+0000
abc table2 dbfs:/mnt/abc/data/table2 delta EXTERNAL 2022-09-14T18:48:02.859+0000
xyz table1 dbfs:/mnt/xyz/data/table1 delta EXTERNAL 2022-09-14T18:48:02.859+0000
xyz table2 dbfs:/mnt/xyz/data/table2 delta EXTERNAL 2022-09-14T18:48:02.859+0000
xyz table3 dbfs:/mnt/xyz/data/table3 delta EXTERNAL 2022-09-14T18:48:02.859+0000
--------------------------------------------------------------------------------------------------------
Currently I am running this script daily to overwrite the complete table in databrcicks. But my requirement is, this records should update only if there is any change occurred on that particular record else leave it. If there is any new records, it has to be added.
For Ex: Below is the input data frame I get whenever I run the script.
There is a change for one of the record and one new record gets added. only updated record should get updated and new record should be added to the final table.
The output table is expected to be updated as below.
I tried delta merge option but it updates all matching record which is not expected.
I have to keep the load_ts untouched if there NO update/insertion of new record happens (This is to track the time when any update is happened to that record). If any record is updated, ONLY it should update in target table.
Is there any way/logic to achieve this ?
Any leads Appreciated!