0

We have a generic delta lake merge code where it has both update and insert in a single execute statement

I have some situation where

  1. I need to update the values in table (insert should be blank)
  2. I need to insert the values in table
##only Update
df_look.alias("dt").merge(
df_o.alias("lt"),
condition="system_id=lt.system_id") \
.whenMatchedUpdate(set ={"dt.update_tmst":current_timestamp(),"active_ind":lit("N")}) \
.whenNotMatchedInsert("DO NOTHING").execute()


##only Insert
df_look.alias("dt").merge(
df_o.alias("lt"),
condition="system_id=lt.system_id") \
.whenMatchedUpdate(set ={"dt.update_tmst":current_timestamp(),"active_ind":lit("Y")}) \
.whenNotMatchedInsert(values={"dt.system_id": col("lt.system_id")).execute()

I tried the below option


df_look.alias("dt").merge(
df_o.alias("lt"),
condition="dt.system_id=lt.system_id and dt.S4=lt.S4") \
.whenMatchedUpdate(set ={"dt.update_tmst":current_timestamp(),"active_ind":lit("N")}) \
.whenNotMatchedInsert((values={})).execute()

I got he below error

Unable to find the column 'type_cd' of the target table from the INSERT columns: . INSERT clause must specify value for all the columns of the target table

Is there any way in single statement we can use update and make insert blank and next statement we can use only insert and update condition blank . Will the delta lake supports ?

Joe
  • 47
  • 7

0 Answers0