-2

I don't know if this happening because Scala is so version restrictive or because all libraries are deprecated and not updated.

I have a little project in Scala Play with Apache Spark. I want and I like to use latest versions of the libraries, so I started the project so:

Scala v2.12.2
Play Framework v2.8.2
Apache Spark v3.0.0

I need to read csv, process it and insert into Impala Kudu database. Using jdbc connection and inserting data using prepared statements with query is not an improvement because I don't use Apache Spark at his max power (use it just for read file).

So, I heard about KuduContext. I tried to install it, but surprise. KuduContext works only with Scala v2.11 and Apache Spark v2.4.6 (nothing about Play).

I uninstalled spark v3, download, install and set environments for Spark v2.4.6 again. Created new project with these configurations

Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0

I found something incompatible with Play and downgrade it to 2.7. Later, I found some incompatibilities with Jackson module.

java.lang.ExceptionInInitializerError
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.10-1

Required to install "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.6.5". Now, when I start the project, when use SparkContext I gen another error

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$

...

finally, my build.sbt became:

Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
jackson-module-scala v2.6.5

Some code:

object SparkContext {
  val spark = SparkSession
    .builder
    .appName("SparkApp")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
    .getOrCreate()

  val context = spark.sparkContext
}

SparkContext using here:

val df = SparkContext.spark.read.csv(filePath) // here the error ocurred
val lines = df.take(1001).map(mapper)

Its so heavy to take care about compatibilities with another libraries when create a new library version in this ecosystem? I found a lot of posts created about versions incompatibilities, but not a solution. What I miss here? thanks

AlleXyS
  • 2,476
  • 2
  • 17
  • 37

1 Answers1

2

Damn, I found the solution:

libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"

Besides jackson-module, I need to install jackson-databind.

My build.sbt became:

scalaVersion := "2.11.11"
val sparkVersion = "2.4.6"
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.kudu" %% "kudu-spark2" % "1.12.0"
)
libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"

// plugins.sbt
play version "2.7.5"

I really hope this will help somebody else which need to use these libraries togheter and who found issues like mine. I spent 3 hours to find a solution for a "simple" project configuration.

AlleXyS
  • 2,476
  • 2
  • 17
  • 37