Runtime error on Scala Spark 2.0 code

Question

I have the following code:

import org.apache.spark.sql.SparkSession
        .
        .
        .
    val spark = SparkSession
      .builder()
      .appName("PTAMachineLearner")
      .getOrCreate()

When it executes, I get the following error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:750)
    at org.apache.spark.sql.SparkSession$Builder.appName(SparkSession.scala:741)
    at com.acme.pta.accuracy.ml.PTAMachineLearnerModel.getDF(PTAMachineLearnerModel.scala:52)

The code compiles and builds just fine. Here are the dependencies:

scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
  // Spark dependencies
  "org.apache.spark" %% "spark-hive" % "2.1.1",
  "org.apache.spark" %% "spark-mllib" % "2.1.1",
  // Third-party libraries
  "net.sf.jopt-simple" % "jopt-simple" % "5.0.3",
  "com.amazonaws" % "aws-java-sdk" % "1.3.11",
  "org.apache.logging.log4j" % "log4j-api" % "2.8.2",
  "org.apache.logging.log4j" % "log4j-core" % "2.8.2",
  "org.apache.logging.log4j" %% "log4j-api-scala" % "2.8.2",
  "com.typesafe.play" %% "play-ahc-ws-standalone" % "1.0.0-M9",
  "net.liftweb" % "lift-json_2.11" % "3.0.1"
)

I am executing the code like this:

/Users/paulreiners/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
      --class "com.acme.pta.accuracy.ml.CreateRandomForestRegressionModel" \
      --master local[4] \
      target/scala-2.11/acme-pta-accuracy-ocean.jar \

I had this all running with Spark 1.6. I'm trying to upgrade to Spark 2, but am missing something.

It looks like problem with a package. Scala std lib is missing. — Zernike, Jun 15 '17 at 18:42
Add scala-library-2.11.jar to classpath or put into jar. It depends on your deployment process. — Zernike, Jun 15 '17 at 18:57

score 0 · Accepted Answer · answered Jun 15 '17 at 18:57

0

The class ArrowAssoc is indeed present in your Scala library. See this Scala doc . But you are getting error in Spark library. So obviously, Spark version you are using is not compatible with Scala ver 2.11 as it is probably compiled with older Scala version. If you see this older Scala API doc , the ArrowSpec has changed a lot. e.g. it is implicit now with lots of implicit dependencies. Make sure your Spark and Scala version are compatible.

answered Jun 15 '17 at 18:57

Apurva Singh

4,534
4
33
42

I want to use at least Spark 2.0, since that is the first version that has the model saving and loading capability I need. So what version of Scala do I need and where can this be looked up? – Paul Reiners Jun 15 '17 at 19:13
I am using compatible versions: "For the Scala API, Spark 2.1.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)." from https://spark.apache.org/docs/latest/ – Paul Reiners Jun 15 '17 at 19:19
@paul Your dependencies is correct. Problem in runtime environment. 1) If you have preinstalled libraries on worker nodes then i guess you need to update it. spark-1.6 uses scala-2.10 by default. 2) If you deploy fat jar then your packaging is wrong (sbt-assembly settings for example). – Zernike Jun 15 '17 at 19:30
I'm deploying a fat JAR. I didn't see anything in the assembly settings that seemed specific to the version of Scala I'm using. – Paul Reiners Jun 15 '17 at 19:39
I did find this in build.sbt: // A special option to exclude Scala itself from our assembly JAR, since Spark // already bundles in Scala. assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) So I changed the false to true, but that didn't fix it. So I deleted that line, but that didn't fix it either. – Paul Reiners Jun 15 '17 at 19:48

score 0 · Answer 2 · answered Jun 15 '17 at 20:11

0

I found the problem. I had Scala 2.10.5 installed on my system. So either sbt or spark-submit was calling that and expecting 2.11.11.

answered Jun 15 '17 at 20:11

Paul Reiners

8,576
33
117
202

score 0 · Answer 3 · answered Aug 24 '17 at 06:18

I had the same issue. But, in my case, the problem was that I deployed the jar in Spark1.x cluster where as the code is written in Spark2.x.

So, if you see this error, just check the versions of spark & scala used in your code against the respective installed versions.

Runtime error on Scala Spark 2.0 code

3 Answers3