I have read an avro file into spark RDD and need to conver that into a sql dataframe. how do I do that.
This is what I did so far.
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
import org.apache.hadoop.io.NullWritable
val path = "hdfs://dds-nameservice/user/ghagh/"
val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable, AvroInputFormat[GenericRecord]](path)
When I do:
avro.take(1)
I get back
res1: Array[(org.apache.avro.mapred.AvroWrapper[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = Array(({"column1": "value1", "column2": "value2", "column3": value3,...
How do I convert this to a SparkSQL dataframe?
I am using Spark 1.6
Can anyone tell me if there is an easy solution around this?