Schema Evaluation with overwrite in Delta tables

Question

I am trying to include schema change for new type and dropped column in Delta tables.As per documentation

As per documentation:


df3
  .write
  .format("delta")
  .mode("overwrite")
  .option("mergeSchema", "true")
  .save(deltapath)

This way I loose old data. Is there anyway to keep olddata before overwriting with new schema apart from taking backup. I have posted same query in databricks forum,link for your reference: https://community.databricks.com/s/question/0D58Y00009WjZllSAF/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in-schema-evaluation

One way is to create a backup of file before overwriting it.But I want to know what is best way.

score 0 · Accepted Answer · answered Nov 28 '22 at 17:18

The command you shared won't delete the existing data and it should still be queryable.

For example:

Create a table with 5 rows

spark.range(5).write.format("delta").save("/tmp/table")

Overwrite with a new column & mergeSchema option

spark.range(5)
.withColumn("data", lit("data"))
.write.format("delta")
.mode("overwrite")
.option("mergeSchema","true")
.save("/tmp/table")

Query new/current state

SELECT * FROM delta.`/tmp/table` VERSION AS OF 1

+---+----+
| id|data|
+---+----+
|  0|data|
|  1|data|
|  2|data|
|  3|data|
|  4|data|
+---+----+

1 Answers1