Databricks SQL AddColumn While Creating Delta Table

Question

I am trying to create a delta table with an added column in the DBSQL metastore from a delta bucket. I do not want to pass in the schema as this may change in the bucket over time but I do want to add a column to the metastore only that is a generatedAlways column so it populated with new values as the delta bucket receives new data. This is my code based on Databricks documentation:

DeltaTable.createIfNotExists(spark) \
.tableName("golddata.table") \
.addColumn("date", DateType(), generatedAlwaysAs="CAST(concat(year,month,'-01') AS DATE)") \
.location("cloudBucket://golddata/table") \
.execute()

This codes gives a schema mismatch error. Is there a way to add a column in Databricks SQL Metastore to the existing schema that is being loaded from my delta bucket? Will using the generatedAlways function updated when the data in the bucket is updated?

score 0 · Answer 1 · answered Nov 23 '22 at 19:29

0

If you want to add columns to existing Delta Table you have to specify two options:

write or writeStream are set with .option("mergeSchema", "true")
spark.databricks.delta.schema.autoMerge.enabled is set to true

If these two are provided, then Delta should merge in your extra column into existing schema.

Sources:

https://learn.microsoft.com/en-us/azure/databricks/delta/update-schema#add-columns-with-automatic-schema-update

answered Nov 23 '22 at 19:29

Bartosz Gajda

984
6
14

I am using the DeltaTableBuilder to create this table in Databricks SQL hive from a formatted Delta Table in a storage bucket. This does not have attributes for `write` or `writeStream`. – Prof. Falken Nov 30 '22 at 14:33

Databricks SQL AddColumn While Creating Delta Table

1 Answers1