I see that spark pulls Guava 16.0.1 transitively via one of it's hadoop dependencies (supposedly) but in my Maven project I also need cassandra-unit which forces Guava 21.0 which then breaks spark when I try to read a file with the sparkSession.sparkContext.textfile
method. I was wondering if anyone else has come across a similar problem and how you've solved it? Here's the stacktrace:
*** RUN ABORTED ***
java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
...
UPDATE: As it's been suggested in the comments that this might be a duplicate of another stackoverflow question I did try and force versions 2.7.2 and 2.9.0 of hadoop-mapreduce-client-core
and hadoop-common
. I don't think you can do that though as this breaks elsewhere with: java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
which is a class in a super old 1.10 version of commons-configuration
which obviously something else relies on. It's that vicious circle...
UPDATE: Originally my code was using a library which was using SparkContext
api to invoke the sparkSession.sparkContext.textfile
method. The issue is no longer evident when I switch to the SparkSession
api with sparkSession.read
.