0

I have just build spark 1.2.1, and I am trying to run the avro example, but it fails.

    cd spark-1.2.1
    mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.2 -DskipTests clean package

What am I doing wrong ? I run :

    cd spark-1.2.1
    bin/spark-submit --driver-class-path examples/target/spark-examples_2.10-1.2.1.jar examples/src/main/python/avro_inputformat.py examples/src/main/resources/users.avro

And I end up with following error :

 py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
: java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroKey
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
MathiasOrtner
  • 583
  • 1
  • 4
  • 17

1 Answers1

0

I found the trick in one answer to this question Spark: Writing to Avro file

I needed to add the following block to the maven pom.xml file before building, then it worked.

<dependency>
 <groupId>org.apache.avro</groupId>
  <artifactId>avro-mapred</artifactId>
  <version>1.7.7</version>
  <classifier>hadoop2</classifier>
</dependency>

Apparently related to this issue https://issues.apache.org/jira/browse/SPARK-3039

Community
  • 1
  • 1
MathiasOrtner
  • 583
  • 1
  • 4
  • 17