How to handle mergeschema option for differing datatypes in Databricks?

Question

import spark.implicits._

val data = Seq(("James","Sales",34))
val df1 = data.toDF("name","dept","age")
df1.printSchema()
df1.write.option("mergeSchema", "true").format("delta").save("/location")

val data2 = Seq(("Tiger","Sales","34") )
var df2 = data2.toDF("name","dept","age")
df2.printSchema()
df2.write.option("mergeSchema", "true").format("delta").save("/location")
df2.show(false)

When we write the df2 dataframe, it fails because in the delta table age is of IntergerType and the second df2 age is of StringType. How do we handle such sitaution so that the code handles this case smoothly.

You will have to provide the mode whether you need to append the data or overwrite the data. Also you need to provide the option of OverwriteSchema to True. — Nikunj Kakadiya, Dec 17 '21 at 05:09
See this link : https://medium.com/@amany.m.abdelhalim/appending-overwriting-with-different-schema-to-delta-lake-vs-parquet-6b39c4a5d5dc — Nikunj Kakadiya, Dec 17 '21 at 05:09

score 0 · Answer 1 · answered Dec 17 '21 at 05:21

0

you can just use the option of overwriteSchema to true and that should work.

val data = Seq(("James","Sales",34))
val df1 = data.toDF("name","dept","age")
df1.printSchema()
df1.write.option("mergeSchema", "true").format("delta").save("/location")

val data2 = Seq(("Tiger","Sales","34") )
var df2 = data2.toDF("name","dept","age")
df2.printSchema()
df2.write.option("overwriteSchema", "true").mode("overwrite").format("delta").save("/location")

answered Dec 17 '21 at 05:21

Nikunj Kakadiya

2,689
2
20
35

if I give the mode as "overwrite" wont the existing data be overwritten? I dont want the existing data to be overwritten. – boom_clap Dec 22 '21 at 09:36
in that case you can just use the mode as append – Nikunj Kakadiya Dec 22 '21 at 09:47
I did try that. It gives the below "AnalysisException: Failed to merge fields 'age' and 'age'. Failed to merge incompatible data types IntegerType and StringType" – boom_clap Dec 22 '21 at 17:51

How to handle mergeschema option for differing datatypes in Databricks?

1 Answers1