How to support addition of new columns to dataset case class while reading old delta file without new column?

Asked Jul 18 '23 at 10:18

Active Jul 18 '23 at 19:00

Viewed 17 times

I have an existing delta file with 4 columns in the schema, which I was converting into dataset at runtime. Case class

case class MyObj2(x:int)
case class MyObj1(p: MyObj3, q:MyObj3)
case class MyCaseClass(a:int, b:MyObj1, c:int, d:MyObj2)

Now I have added a new column in case class My Obj2.

case class MyObj2(x:int, y:int)

This is how I am reading the delta file df as a dataset

val df = spark.read.format("delta").load(path)
df.as[MyCaseClass]

When trying to convert existing delta file (with old schema) into dataset with new schema , it gives me Schema Mismatch error.

I have already tried -

Creating empty df with new schema and using merge schema. It gives me error for other fields such as MyObj1 as I have a map of key value pairs in delta file for this so it doesn't matches while merging schema. Although it works with.as method.

edited Jul 18 '23 at 19:00

Koedlt

4,286
8
15
33

asked Jul 18 '23 at 10:18

P Mittal

How to support addition of new columns to dataset case class while reading old delta file without new column?

0 Answers0