I am doing upsert operation in databricks. Now I want to check what is changed between two upsert operation.
My original df1
look like this>>
My upserted df2
look like this >>
here id
is my primary_key
I am doing upsert operation in databricks. Now I want to check what is changed between two upsert operation.
My original df1
look like this>>
My upserted df2
look like this >>
here id
is my primary_key
Simpler would be to enable Change Data Feed on the given Delta Table - then you can read all changes between - inserts/updates/deletes, and you can consume these changes either as batch or as stream:
all_changes = spark.read.format("delta") \
.option("readChangeFeed", "true") \
.option("startingVersion", 0) \
.option("endingVersion", 10) \
.table("myDeltaTable")
inserts = all_changes.filter("__change_type = 'insert'")
updates = all_changes.filter("__change_type = 'update_postimage'")
deletes = all_changes.filter("__change_type = 'delete'")