In Databricks SQL and Databricks Runtime 12.1 and above, you can use the WHEN NOT MATCHED BY SOURCE clause to UPDATE or DELETE records in the target table that do not have corresponding records in the source table. Databricks recommends adding an optional conditional clause to avoid fully rewriting the target table.
The following code example shows allows for updates and new rows to be reflected in the target table. Can someone help modify the following code to allow for deletes to also be reflected in the target table? I think the correct to express this requirement is overwriting the target table with the contents of the source table and deleting unmatched records in the target table.
try:
#Perform a merge into the existing table
if allowDuplicates == "true":
(deltadf.alias("t")
.merge(
partdf.alias("s"),
f"s.primary_key_hash = t.primary_key_hash")
.whenNotMatchedInsertAll()
.execute()
)
else:
(deltadf.alias("t")
.merge(
partdf.alias("s"),
"s.primary_key_hash = t.primary_key_hash")
.whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
.whenNotMatchedInsertAll().
execute()
)
action = f"Merged into Existing Delta for table {entityName} at path: {saveloc}"
spark.sql(f"OPTIMIZE {stageName}{regName}")
except Exception as e:
print(e)