11

I am currently using Apache Zeppelin 0.8. I tried loading a csv file like this :

val df = spark.read.option("header", "true").option("inferSchema", "true").csv("/path/to/csv/name.csv")

I have also tried this :

val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/path/to/csv/name.csv")

However, it fails printing out the following :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()Lorg/apache/hadoop/fs/FileSystem$Statistics$StatisticsData;

NOTE THAT : Problem is solved IF I specify my own build for Spark using the SPARK_HOME env variable in zeppelin-env.sh. However, I would still like a solution to this that does not require me to do so, as I have a few other libraries that do not work with that version of Spark.

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
Skeftical
  • 151
  • 6

1 Answers1

0

Looks like effective classpath in spark runtime spark has a conflicting version of hadoop-fs library. It may be caused by your fat jar bringing an incompatible version.

If you open Spark UI in Environment tab you can see all the jar files on the classpath. There you can try to figure out which library is causing trouble.

If you're building a fat jar, try having a look into the contents of it to see whether it also contains Hadoop classes

jar -tf /path/to/your/jar | grep "org.apache.hadoop.fs.FileSystem"

If it does you should mark your Hadoop dependencies in mvn/sbt as provided.

botchniaque
  • 4,698
  • 3
  • 35
  • 63