I am looking for a smarter way to perform an insert into a delta table based on a condition that does InsertWhenMatched where I don't need to fake skipping the update part of the merge with the update_condition = "true = false".
I wasn't able to find something. I assume the options within sdf.format("delta").mode("append").options(***)
might give a solution but couldn't find any docu which options are supported.
I am using Databricks on Azure with runtime 11.3 LTS
Let's say I have the follwing tables
sdf1 = spark.createDataFrame(
[
(1, "foo", "dd", "1", "99"),
(2, "bar", "2sfs", "1", "99"),
],
["id", "col", "col2", "s_date", "e_date"],
)
sdf2 = spark.createDataFrame(
[
(1, "foo", "dd", "1", "99"),
(2, "bar", "2sfs", "33", "99"),
(3, "bar", "dwdw", "3", "5"),
],
["id", "col", "col2", "s_date", "e_date"],
)
My expectation is to add only the rows from the second sdf2 based on a condition. let's say in the case target.id <> source.id
I dummy created an insert when matched logic by myself
from delta.tables import DeltaTable
insert_condition = "target.id <> source.id"
merge_condition = f"not ({insert_condition})"
#merge_condition = "target.id = source.id"
update_condition = "true = false"
insert_condition = None
delta_path = delta_path = "/mnt/raw/testNiko/matchedInsert"
#write SDF1`
sdf1.write.format("delta").mode("overwrite").option("overwriteSchema", "True").save(delta_path)
#Insert when matched SDF2
delta_table = DeltaTable.forPath(spark, delta_path)
delta_merge_builder = delta_table.alias("target").merge(sdf2.alias("source"), merge_condition)
delta_merge_builder = delta_merge_builder.whenMatchedUpdateAll(update_condition)
delta_merge_builder = delta_merge_builder.whenNotMatchedInsertAll(insert_condition)
delta_merge_builder.execute()
sdf_merge = spark.read.format("delta").load(delta_path)
display(sdf_merge)
The expected result is there but looking forward to a smarter idea which does InsertWhenMatched where I don't need to fake skipping the update part of the merge with the update_condition = "true = false"