0

I am trying to include schema change for new type and dropped column in Delta tables.As per documentation

As per documentation:


df3
  .write
  .format("delta")
  .mode("overwrite")
  .option("mergeSchema", "true")
  .save(deltapath)

This way I loose old data. Is there anyway to keep olddata before overwriting with new schema apart from taking backup. I have posted same query in databricks forum,link for your reference: https://community.databricks.com/s/question/0D58Y00009WjZllSAF/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in-schema-evaluation

One way is to create a backup of file before overwriting it.But I want to know what is best way.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Ritz
  • 3
  • 1

1 Answers1

0

The command you shared won't delete the existing data and it should still be queryable.

For example:

  1. Create a table with 5 rows

    spark.range(5).write.format("delta").save("/tmp/table")

  2. Overwrite with a new column & mergeSchema option

    spark.range(5)
    .withColumn("data", lit("data"))
    .write.format("delta")
    .mode("overwrite")
    .option("mergeSchema","true")
    .save("/tmp/table")
    
  3. Query original data

    SELECT * FROM delta.`/tmp/table` VERSION AS OF 0

    +---+
    | id|
    +---+
    |  0|
    |  1|
    |  2|
    |  3|
    |  4|
    +---+
    
  4. Query new/current state

    SELECT * FROM delta.`/tmp/table` VERSION AS OF 1

    +---+----+
    | id|data|
    +---+----+
    |  0|data|
    |  1|data|
    |  2|data|
    |  3|data|
    |  4|data|
    +---+----+
    
Nick Karpov
  • 500
  • 3
  • 5