0

I have the following code:

import org.apache.spark.sql.SparkSession
        .
        .
        .
    val spark = SparkSession
      .builder()
      .appName("PTAMachineLearner")
      .getOrCreate()

When it executes, I get the following error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:750)
    at org.apache.spark.sql.SparkSession$Builder.appName(SparkSession.scala:741)
    at com.acme.pta.accuracy.ml.PTAMachineLearnerModel.getDF(PTAMachineLearnerModel.scala:52)

The code compiles and builds just fine. Here are the dependencies:

scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
  // Spark dependencies
  "org.apache.spark" %% "spark-hive" % "2.1.1",
  "org.apache.spark" %% "spark-mllib" % "2.1.1",
  // Third-party libraries
  "net.sf.jopt-simple" % "jopt-simple" % "5.0.3",
  "com.amazonaws" % "aws-java-sdk" % "1.3.11",
  "org.apache.logging.log4j" % "log4j-api" % "2.8.2",
  "org.apache.logging.log4j" % "log4j-core" % "2.8.2",
  "org.apache.logging.log4j" %% "log4j-api-scala" % "2.8.2",
  "com.typesafe.play" %% "play-ahc-ws-standalone" % "1.0.0-M9",
  "net.liftweb" % "lift-json_2.11" % "3.0.1"
)

I am executing the code like this:

/Users/paulreiners/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
      --class "com.acme.pta.accuracy.ml.CreateRandomForestRegressionModel" \
      --master local[4] \
      target/scala-2.11/acme-pta-accuracy-ocean.jar \

I had this all running with Spark 1.6. I'm trying to upgrade to Spark 2, but am missing something.

Paul Reiners
  • 8,576
  • 33
  • 117
  • 202

3 Answers3

0

The class ArrowAssoc is indeed present in your Scala library. See this Scala doc . But you are getting error in Spark library. So obviously, Spark version you are using is not compatible with Scala ver 2.11 as it is probably compiled with older Scala version. If you see this older Scala API doc , the ArrowSpec has changed a lot. e.g. it is implicit now with lots of implicit dependencies. Make sure your Spark and Scala version are compatible.

Apurva Singh
  • 4,534
  • 4
  • 33
  • 42
  • I want to use at least Spark 2.0, since that is the first version that has the model saving and loading capability I need. So what version of Scala do I need and where can this be looked up? – Paul Reiners Jun 15 '17 at 19:13
  • I am using compatible versions: "For the Scala API, Spark 2.1.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)." from https://spark.apache.org/docs/latest/ – Paul Reiners Jun 15 '17 at 19:19
  • @paul Your dependencies is correct. Problem in runtime environment. 1) If you have preinstalled libraries on worker nodes then i guess you need to update it. spark-1.6 uses scala-2.10 by default. 2) If you deploy fat jar then your packaging is wrong (sbt-assembly settings for example). – Zernike Jun 15 '17 at 19:30
  • I'm deploying a fat JAR. I didn't see anything in the assembly settings that seemed specific to the version of Scala I'm using. – Paul Reiners Jun 15 '17 at 19:39
  • I did find this in build.sbt: // A special option to exclude Scala itself from our assembly JAR, since Spark // already bundles in Scala. assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) So I changed the false to true, but that didn't fix it. So I deleted that line, but that didn't fix it either. – Paul Reiners Jun 15 '17 at 19:48
0

I found the problem. I had Scala 2.10.5 installed on my system. So either sbt or spark-submit was calling that and expecting 2.11.11.

Paul Reiners
  • 8,576
  • 33
  • 117
  • 202
0

I had the same issue. But, in my case, the problem was that I deployed the jar in Spark1.x cluster where as the code is written in Spark2.x.

So, if you see this error, just check the versions of spark & scala used in your code against the respective installed versions.

Sruthi Poddutur
  • 1,371
  • 13
  • 7