3

I'm trying to read an Avro file using Jupyter notebook in Azure HDInsight 4.0 with Spark 2.4. I'm not able to provide properly the .jar file to

I've tried the approach suggested in How to use Avro on HDInsight Spark/Jupyter? and in https://learn.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages but I guess they are related to Spark 2.3

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}

This produce the error message:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'

James Z
  • 12,209
  • 10
  • 24
  • 44
MDP89
  • 306
  • 1
  • 9
  • Try this solution https://stackoverflow.com/questions/56618866/trouble-reading-avro-files-in-jupyter-notebook-using-pyspark/56633190#56633190 – Ranga Vure Oct 25 '19 at 18:35
  • That solution works in other environments (tested with success) but on HDInsight fails: the first command create the spark session so spark-submit is launched even before the import os – MDP89 Oct 25 '19 at 19:14

1 Answers1

2

The solution that seem to work is

%%configure -f 
{ "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}
MDP89
  • 306
  • 1
  • 9