-2

I am doing upsert operation in databricks. Now I want to check what is changed between two upsert operation.

My original df1 look like this>> My 1st Dataframe

My upserted df2 look like this >> My 2nd Dataframe

I want Output like this>> Output Dataframe

here id is my primary_key

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Suraj Shejal
  • 640
  • 3
  • 19

1 Answers1

1

Simpler would be to enable Change Data Feed on the given Delta Table - then you can read all changes between - inserts/updates/deletes, and you can consume these changes either as batch or as stream:

all_changes = spark.read.format("delta") \
  .option("readChangeFeed", "true") \
  .option("startingVersion", 0) \
  .option("endingVersion", 10) \
  .table("myDeltaTable")
inserts = all_changes.filter("__change_type = 'insert'")
updates = all_changes.filter("__change_type = 'update_postimage'")
deletes = all_changes.filter("__change_type = 'delete'")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132