0

I am looking for a smarter way to perform an insert into a delta table based on a condition that does InsertWhenMatched where I don't need to fake skipping the update part of the merge with the update_condition = "true = false".

I wasn't able to find something. I assume the options within sdf.format("delta").mode("append").options(***) might give a solution but couldn't find any docu which options are supported.

I am using Databricks on Azure with runtime 11.3 LTS

Let's say I have the follwing tables

sdf1 = spark.createDataFrame(
 [
  (1, "foo", "dd", "1", "99"),
  (2, "bar", "2sfs", "1", "99"),
 ],
 ["id", "col", "col2", "s_date", "e_date"],
)

sdf2 = spark.createDataFrame(
 [
  (1, "foo", "dd", "1", "99"),
  (2, "bar", "2sfs", "33", "99"),
 (3, "bar", "dwdw", "3", "5"),
 ],
 ["id", "col", "col2", "s_date", "e_date"],
)

My expectation is to add only the rows from the second sdf2 based on a condition. let's say in the case target.id <> source.id

I dummy created an insert when matched logic by myself

from delta.tables import DeltaTable
insert_condition = "target.id <> source.id"
merge_condition = f"not ({insert_condition})"
#merge_condition = "target.id = source.id"
update_condition = "true = false"
insert_condition = None
delta_path = delta_path = "/mnt/raw/testNiko/matchedInsert"

#write SDF1`
sdf1.write.format("delta").mode("overwrite").option("overwriteSchema", "True").save(delta_path)

#Insert when matched SDF2
delta_table = DeltaTable.forPath(spark, delta_path)
delta_merge_builder = delta_table.alias("target").merge(sdf2.alias("source"), merge_condition)

delta_merge_builder = delta_merge_builder.whenMatchedUpdateAll(update_condition)
delta_merge_builder = delta_merge_builder.whenNotMatchedInsertAll(insert_condition)
delta_merge_builder.execute()

sdf_merge = spark.read.format("delta").load(delta_path)
display(sdf_merge)

The expected result is there but looking forward to a smarter idea which does InsertWhenMatched where I don't need to fake skipping the update part of the merge with the update_condition = "true = false"

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Nikolaos
  • 21
  • 2
  • 1
    ask question in a clear and crispy formatted way with out which its difficult to answer general questions What is meaning of smart ? spell it – Ram Ghadiyaram Feb 08 '23 at 02:09
  • @RamGhadiyaram thanks a lot I added it: The expected result is there but looking forward to a smarter idea which does InsertWhenMatched where I don't need to fake skipping the update part of the merge with the update_condition = "true = false" – Nikolaos Feb 08 '23 at 11:32

0 Answers0