I'm having a problem using spark.ml.util.SchemaUtils on Spark v1.6.0. I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType;
at org.apache.spark.ml.SchemaTest$.main(SchemaTest.scala:17)
when running this minimal example on my cluster (inspired by the library I ultimately want to use):
package org.apache.spark.ml
import org.apache.spark.ml.util.SchemaUtils
import org.apache.spark.sql.types._
import org.apache.spark.mllib.linalg.VectorUDT
object SchemaTest {
def main(args: Array[String]): Unit = {
val schema: StructType =
StructType(
StructField("a", IntegerType, true) :: StructField("b", LongType, false) :: Nil
)
val transformed = SchemaUtils.appendColumn(schema, "test", new VectorUDT())
}
}
However, the same example launched locally on my desktop runs without problems.
From what I saw online (for example here), this kind of error message is often linked to a version mismatch between compilation and runtime environements, but my program, my local spark distribution, and my cluster distribution all have the same Spark & mllib versions v1.6.0, the same Scala version v2.10.6, and the same Java version v7.
I checked the Spark 1.6.0 source code and the appendColumn does exist in org.apache.spark.ml.util.SchemaUtils, with the right signature (but SchemaUtils is not mentionned in the org.apache.spark.ml.util API documentation).
ETA: Extract from my pom.xml file:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
</dependencies>