I want to update my target Delta table in databricks when certain column values in a row matches with same column values in Source table.
The problem is when I have multiple rows in source table that matches one row in target Delta table.
This is a scenario where primary keys of two or more rows in source table matches with primary keys of one row in delta table. I have tried to replicate the scenario below:
sql="""
MERGE INTO """ + targetTable + """ TGT USING """ + tempView + """ SRC ON TGT.id = SRC.id and TGT.category != SRC.category and TGT.startdt = SRC.startdt
WHEN MATCHED THEN DELETE"""
targetDF = spark.createDataFrame([{"id": "5001","category": "N1","startDt": "2019-09-30 00:00:00.000"}])
sourceDF = spark.createDataFrame([{"id": "5001","category": "E1","startDt": "2019-09-30 00:00:00.000"},{"id": "5001","category": "B1","startDt": "2019-09-30 00:00:00.000"}])
targetDF.write.format("delta").mode("overwrite").saveAsTable("test.targetDF")
sourceDF.createOrReplaceTempView("tempView")
sqlOut=spark.sql(sql)
display(spark.sql("select * from test.targetDelta"))
I have tried left join(targetTable left join sourceTable) on two tables where my id and startDt matches to get one row in my targetTable which I would like to delete but couldn't figure out how to do that.
spark.sql("""Select TGT.id from test.targetDF TGT left join tempView SRC ON TGT.id = SRC.id and TGT.startDt= SRC.startDt""")
Thanks in advance.