0

I need to update delta table on the basis of update delta table rows.

Update table (source_df)
+----------------------------------------------------------
|ID|  NAME|ADDRESS|DELETE_FLAG|INSERT_DATE|UPDATE_DATE|
+----------------------------------------------------------
| 1|sravan|delhi  |false      |02/02/2023 |02/02/2023|
| 3|rohith|jaipur |false      |02/02/2023 |02/02/2023|

Delta table (delta_df)
+----------------------------------------------------------
|ID|  NAME|ADDRESS|DELETE_FLAG|INSERT_DATE|UPDATE_DATE|
+----------------------------------------------------------
| 1|sravan|delhi  |false      |25/01/2023 |25/01/2023|
| 2|ojasvi|patna  |false      |25/01/2023 |25/01/2023|
| 3|rohith|jaipur |false      |25/01/2023 |25/01/2023|

I want to update the delta table (both are delta table instances and not dataframes)

Final table would look like this,

Delta table (Updated)

+----------------------------------------------------------
|ID|  NAME|ADDRESS|DELETE_FLAG|INSERT_DATE|UPDATE_DATE|
+----------------------------------------------------------
| 1|sravan|delhi  |false      |25/01/2023 |25/01/2023|
| 2|ojasvi|patna  |true       |25/01/2023 |02/02/2023|
| 3|rohith|jaipur |false      |25/01/2023 |25/01/2023|

Let the keys for both tables be ID and NAME, I tried the following,

delta_df.merge(
        source_df,
        f"delta_df.DELETE_FLAG= 0 AND (delta_df.ID <> source_df.ID AND delta_df.NAME <> source_df.NAME)"
    ).whenMatchedUpdate(
        set = {
            "DELETE_FLAG" : "1",
            "UPDATE_DATE" : "source_df.UPDATE_DATE"
        }
    ).execute()

However, the above code is taking infinite time to execute. I am new to PySPark and not sure what is the most efficient way to deal with the problem at hand.

Note: They are not dataframes.

royalewithcheese
  • 402
  • 4
  • 17
  • Does this answer your question? [Delta merge logic whenMatchedDelete case](https://stackoverflow.com/questions/66953143/delta-merge-logic-whenmatcheddelete-case) – Jonathan Lam Feb 02 '23 at 03:06
  • I tried this approach still doesn't seem to work as I want to update the the value of deltatable. – royalewithcheese Feb 02 '23 at 03:29

0 Answers0