Method unknown error on cluster, works locally - both spark versions are identical

Question

I'm having a problem using spark.ml.util.SchemaUtils on Spark v1.6.0. I get the following error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType;
        at org.apache.spark.ml.SchemaTest$.main(SchemaTest.scala:17)

when running this minimal example on my cluster (inspired by the library I ultimately want to use):

package org.apache.spark.ml

import org.apache.spark.ml.util.SchemaUtils
import org.apache.spark.sql.types._
import org.apache.spark.mllib.linalg.VectorUDT

object SchemaTest {

  def main(args: Array[String]): Unit = {
    val schema: StructType =
      StructType(
        StructField("a", IntegerType, true) :: StructField("b", LongType, false) :: Nil
      )

    val transformed = SchemaUtils.appendColumn(schema, "test", new VectorUDT())

  }
}

However, the same example launched locally on my desktop runs without problems.

From what I saw online (for example here), this kind of error message is often linked to a version mismatch between compilation and runtime environements, but my program, my local spark distribution, and my cluster distribution all have the same Spark & mllib versions v1.6.0, the same Scala version v2.10.6, and the same Java version v7.

I checked the Spark 1.6.0 source code and the appendColumn does exist in org.apache.spark.ml.util.SchemaUtils, with the right signature (but SchemaUtils is not mentionned in the org.apache.spark.ml.util API documentation).

ETA: Extract from my pom.xml file:

 <dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.10.6</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.0</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.6.0</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>1.6.0</version>
        <scope>provided</scope>
    </dependency>

</dependencies>

score 0 · Answer 1 · answered Mar 16 '17 at 17:30

0

You need to check all the places where you can change the classpath for the jobs running on the cluster. For example, these properties:

spark.driver.extraClassPath
spark.driver.userClassPathFirst
spark.executor.extraClassPath
spark.executor.userClassPathFirst

You also should examine your assembly process to ensure that the dependencies you think you're packaging are in fact what you're packaging.

answered Mar 16 '17 at 17:30

Vidya

29,932
7
42
70

The conf files on my local spark are empty and the Environement tab of the Spark UI on the cluster does not show any parameters for ClassPath, so I assume both versions use the values by default. My packaging also seems to have the correct versions for all (I added the pom dependencies to my OP). – datasock Mar 17 '17 at 09:40

Method unknown error on cluster, works locally - both spark versions are identical

1 Answers1