0

Scenario:

df1 ---> Col1,Col2,Col3 -- which are the columns in the delta table

df2 ---> Col1,Col2,Col3,Col4,Col5 -- which are the columns in the latest refresh table

  1. How to get the new columns (in the above Col4,Col5) with datatypes dynamically.
  2. How to alter the existing Delta table to include the new columns (in the above Col4,Col5) dynamically and update the new column values

Thanks for your help.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
skp
  • 314
  • 1
  • 14
  • What do you mean under "get new columns with datatypes dynamically"? What is the source of refresh table? – Alex Ott Feb 13 '22 at 15:52
  • @Alex Ott - Thanks for response. Source of refresh table is the csv file from the blob storage, which is loaded and converted into the table with createOrReplaceTempView. when I say dynamically means something that is equal to in sql - select column_name,data_type from information_schema.columns where table = createOrReplaceTempViewtablename. In Databricks we have printschema, Describe to see the metadata of the table but how to get the columns and datatypes like in SQL so that I can use forloop to ALTER TABLE deltatable ADD {column_name} {data_type}" – skp Feb 13 '22 at 17:25

1 Answers1

2

You don't need to perform explicit ALTER TABLE if you have Delta table - you just need to use built-in capabilities for schema evolution (blog post) - just add the mergeSchema option with value true, and Delta will take care for updating schema. For example, if I have initial table with two fields: i1 & i2:

df1 = spark.createDataFrame([[1,2]], schema="i1 int, i2 int")
df1.write.format("delta").mode("overwrite").saveAsTable("test")

then I can update it even if I have more columns:

df2 = spark.createDataFrame([[1,2, '1']], schema="i1 int, i2 int, v string")
df2.write.format("delta").mode("append") \
  .option("mergeSchema", "true").saveAsTable("test")

You can find more information about schema evolution & enforcement in the following blog post and this webinar.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks for your help it worked. Is there any way that we can do this with merge with condition matched and unmatched for updates and inserts. – skp Feb 16 '22 at 16:38
  • yes, it's possible - you need to set a special property for that - see docs: https://docs.delta.io/latest/delta-update.html#automatic-schema-evolution – Alex Ott Feb 16 '22 at 17:13