0

I am trying to create a delta table with an added column in the DBSQL metastore from a delta bucket. I do not want to pass in the schema as this may change in the bucket over time but I do want to add a column to the metastore only that is a generatedAlways column so it populated with new values as the delta bucket receives new data. This is my code based on Databricks documentation:

DeltaTable.createIfNotExists(spark) \
.tableName("golddata.table") \
.addColumn("date", DateType(), generatedAlwaysAs="CAST(concat(year,month,'-01') AS DATE)") \
.location("cloudBucket://golddata/table") \
.execute()

This codes gives a schema mismatch error. Is there a way to add a column in Databricks SQL Metastore to the existing schema that is being loaded from my delta bucket? Will using the generatedAlways function updated when the data in the bucket is updated?

Prof. Falken
  • 499
  • 6
  • 21

1 Answers1

0

If you want to add columns to existing Delta Table you have to specify two options:

  • write or writeStream are set with .option("mergeSchema", "true")
  • spark.databricks.delta.schema.autoMerge.enabled is set to true

If these two are provided, then Delta should merge in your extra column into existing schema.

Sources:

Bartosz Gajda
  • 984
  • 6
  • 14
  • I am using the DeltaTableBuilder to create this table in Databricks SQL hive from a formatted Delta Table in a storage bucket. This does not have attributes for `write` or `writeStream`. – Prof. Falken Nov 30 '22 at 14:33