0

In Databricks SQL and Databricks Runtime 12.1 and above, you can use the WHEN NOT MATCHED BY SOURCE clause to UPDATE or DELETE records in the target table that do not have corresponding records in the source table. Databricks recommends adding an optional conditional clause to avoid fully rewriting the target table.

The following code example shows allows for updates and new rows to be reflected in the target table. Can someone help modify the following code to allow for deletes to also be reflected in the target table? I think the correct to express this requirement is overwriting the target table with the contents of the source table and deleting unmatched records in the target table.

try:
  #Perform a merge into the existing table
  if allowDuplicates == "true":
    (deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        f"s.primary_key_hash = t.primary_key_hash")
      .whenNotMatchedInsertAll()
     .execute()
    )
  else:
    (deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        "s.primary_key_hash = t.primary_key_hash")
      .whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
      .whenNotMatchedInsertAll().
     execute()
    )
  
  action = f"Merged into Existing Delta for table {entityName} at path: {saveloc}"

  spark.sql(f"OPTIMIZE {stageName}{regName}")  
except Exception as e:
  print(e)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Patterson
  • 1,927
  • 1
  • 19
  • 56
  • you mean delete in `WHEN NOT MATCHED BY SOURCE` ? – Alex Ott Mar 19 '23 at 12:16
  • Hi Alex, yes that is correct – Patterson Mar 19 '23 at 17:36
  • Does this answer your question? [Databricks 'DeltaMergeBuilder' object has no attribute 'whenNotMatchedBySourceDelete'](https://stackoverflow.com/questions/75732607/databricks-deltamergebuilder-object-has-no-attribute-whennotmatchedbysourcede) – Alex Ott Mar 23 '23 at 08:53

1 Answers1

1

There is a new function added to the Python API: .whenNotMatchedBySourceDelete()

You need to add it to your code:

(deltadf.alias("t")
       .merge(
        partdf.alias("s"),
        "s.primary_key_hash = t.primary_key_hash")
      .whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
      .whenNotMatchedInsertAll()
      .whenNotMatchedBySourceDelete()
      .execute()
    )

But you need to use at least DBR 12.1 - it's not available in the lower versions

Alex Ott
  • 80,552
  • 8
  • 87
  • 132