1

I have a strange compilation error using scala 2.11 and not with 2.12 (working with spark 2.2.1)

here my scala code

val spark = SparkSession.builder.
      master("local")
      .appName("spark rmd connect import")
      .enableHiveSupport()
      .getOrCreate()


    //LOAD
    var time = System.currentTimeMillis()
    val r_log_o = spark.read.format("orc").load("log.orc")
    val r_log = r_log_o.drop(r_log_o.col("id"))
    System.currentTimeMillis() - time


    time = System.currentTimeMillis()
    r_log_o.toJavaRDD.cache().map((x:Row) => {x(4).asInstanceOf[Timestamp]}).reduce(minTs(_, _))
    System.currentTimeMillis() - time

where

  def minTs(x: Timestamp, y: Timestamp): Timestamp = {
    if (x.compareTo(y) < 0) return x;
    else return y;
  }

my pom.xml is configured as below

<plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.3.1</version>
    <configuration>
      <scalaVersion>2.11</scalaVersion>
    </configuration>
  </plugin>

        <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.2.1</version>
    </dependency>
</dependencies>

if i compile <scalaVersion>2.12</scalaVersion>

it compiles, using scala 2.11 I get the following error

[INFO] /root/project/src/main/java:-1: info: compiling [INFO] /root/project/src/main/scala:-1: info: compiling [INFO] Compiling 2 source files to /root/rmd-connect-spark/target/classes at 1515426201592 [ERROR] /root/rmd-connect-spark/src/main/scala/SparkConnectTest.scala:40: error: type mismatch; [ERROR] found : org.apache.spark.sql.Row => java.sql.Timestamp [ERROR] required: org.apache.spark.api.java.function.Function[org.apache.spark.sql.Row,?] [ERROR] .map((x : Row) => {x(4).asInstanceOf[Timestamp]})

[ERROR] ^ [ERROR] one error found [INFO]

[INFO] BUILD FAILURE [INFO]

NOTE: this is not a problem of spark runtime is a problem of using scala 2.11 with spark api

giusy
  • 365
  • 2
  • 5
  • 17

1 Answers1

4

You have a javaRDD, so you need to use the Java api and a org.apache.spark.api.java.function.Function instead of a Scala function. In Scala 2.12 support was added to automatically convert Scala functions into Java SAM interfaces, which is why this code works in Scala 2.12.

Use the Scala API instead of Java if you are going to be coding in Scala.

puhlen
  • 8,400
  • 1
  • 16
  • 31