Modify Code to support Delete in Delta Lake Tables with Databricks

Question

In Databricks SQL and Databricks Runtime 12.1 and above, you can use the WHEN NOT MATCHED BY SOURCE clause to UPDATE or DELETE records in the target table that do not have corresponding records in the source table. Databricks recommends adding an optional conditional clause to avoid fully rewriting the target table.

The following code example shows allows for updates and new rows to be reflected in the target table. Can someone help modify the following code to allow for deletes to also be reflected in the target table? I think the correct to express this requirement is overwriting the target table with the contents of the source table and deleting unmatched records in the target table.

try:
  #Perform a merge into the existing table
  if allowDuplicates == "true":
    (deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        f"s.primary_key_hash = t.primary_key_hash")
      .whenNotMatchedInsertAll()
     .execute()
    )
  else:
    (deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        "s.primary_key_hash = t.primary_key_hash")
      .whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
      .whenNotMatchedInsertAll().
     execute()
    )
  
  action = f"Merged into Existing Delta for table {entityName} at path: {saveloc}"

  spark.sql(f"OPTIMIZE {stageName}{regName}")  
except Exception as e:
  print(e)

Does this answer your question? [Databricks 'DeltaMergeBuilder' object has no attribute 'whenNotMatchedBySourceDelete'](https://stackoverflow.com/questions/75732607/databricks-deltamergebuilder-object-has-no-attribute-whennotmatchedbysourcede) — Alex Ott, Mar 23 '23 at 08:53

score 1 · Accepted Answer · answered Mar 23 '23 at 08:54

There is a new function added to the Python API: .whenNotMatchedBySourceDelete()

You need to add it to your code:

(deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        "s.primary_key_hash = t.primary_key_hash")
      .whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
      .whenNotMatchedInsertAll()
      .whenNotMatchedBySourceDelete()
      .execute()
    )

But you need to use at least DBR 12.1 - it's not available in the lower versions

Modify Code to support Delete in Delta Lake Tables with Databricks

1 Answers1