5

I'm new to Spark and I'm trying to figure out if there is a way to save complex objects (nested) or complex jsons as Parquet in Spark. I'm aware of the Kite SDK, but I understand it uses Map/Reduce.

I looked around but I was unable to find a solution.

Thanks for your help.

IceMan
  • 1,398
  • 16
  • 35
  • yes it's possible to save nested object as parquet with spark, you have an example of data and the expected result – Mehrez Apr 13 '17 at 14:43
  • @Mehrez I'm not sure what you mean by expected result? The result would be a Parquet file that understands nested structures as supported by the parquet spec (definition and repetition levels) – IceMan Apr 13 '17 at 16:00
  • your problem it's not clean, you have an exception in your code or you look for a code sample to save a nested object as parquet ? – Mehrez Apr 14 '17 at 09:11

1 Answers1

3
case class Address(city:String, block:String);
case class Person(name:String,age:String, address:Address);
val people = sc.parallelize(List(Person("a", "b", Address("a", "b")), Person("c", "d", Address("c", "d"))));

val df  = sqlContext.createDataFrame(people);
df.write.mode("overwrite").parquet("/tmp/people.parquet")

This answer on SO helped. Spark SQL: Nested classes to parquet error

But it was hard to find, so I've answered my own question here. Hope this help someone else looking for an example.

Community
  • 1
  • 1
IceMan
  • 1,398
  • 16
  • 35