I have a strange compilation error using scala 2.11 and not with 2.12 (working with spark 2.2.1)
here my scala code
val spark = SparkSession.builder.
master("local")
.appName("spark rmd connect import")
.enableHiveSupport()
.getOrCreate()
//LOAD
var time = System.currentTimeMillis()
val r_log_o = spark.read.format("orc").load("log.orc")
val r_log = r_log_o.drop(r_log_o.col("id"))
System.currentTimeMillis() - time
time = System.currentTimeMillis()
r_log_o.toJavaRDD.cache().map((x:Row) => {x(4).asInstanceOf[Timestamp]}).reduce(minTs(_, _))
System.currentTimeMillis() - time
where
def minTs(x: Timestamp, y: Timestamp): Timestamp = {
if (x.compareTo(y) < 0) return x;
else return y;
}
my pom.xml is configured as below
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.3.1</version>
<configuration>
<scalaVersion>2.11</scalaVersion>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.1</version>
</dependency>
</dependencies>
if i compile <scalaVersion>2.12</scalaVersion>
it compiles, using scala 2.11 I get the following error
[INFO] /root/project/src/main/java:-1: info: compiling [INFO] /root/project/src/main/scala:-1: info: compiling [INFO] Compiling 2 source files to /root/rmd-connect-spark/target/classes at 1515426201592 [ERROR] /root/rmd-connect-spark/src/main/scala/SparkConnectTest.scala:40: error: type mismatch; [ERROR] found : org.apache.spark.sql.Row => java.sql.Timestamp [ERROR] required: org.apache.spark.api.java.function.Function[org.apache.spark.sql.Row,?] [ERROR] .map((x : Row) => {x(4).asInstanceOf[Timestamp]})
[ERROR] ^ [ERROR] one error found [INFO]
[INFO] BUILD FAILURE [INFO]
NOTE: this is not a problem of spark runtime is a problem of using scala 2.11 with spark api