I want to read avro files located in Amazon S3 from the Zeppelin notebook. I understand Databricks has a wonderful package for it spark-avro
. What are the steps that I need to take in order to bootstrap this jar file to my cluster and make it working?
When I write this in my notebook,
val df = sqlContext.read.avro("s3n://path_to_avro_files_in_one_bucket/")
I get the below error -
<console>:34: error: value avro is not a member of org.apache.spark.sql.DataFrameReader
I have had a look at this. I guess the solution posted there does not work for the latest version of Amazon EMR.
If someone could give me pointers, that would really help.