I don't know if this happening because Scala is so version restrictive or because all libraries are deprecated and not updated.
I have a little project in Scala Play with Apache Spark. I want and I like to use latest versions of the libraries, so I started the project so:
Scala v2.12.2
Play Framework v2.8.2
Apache Spark v3.0.0
I need to read csv, process it and insert into Impala Kudu database. Using jdbc connection and inserting data using prepared statements with query is not an improvement because I don't use Apache Spark at his max power (use it just for read file).
So, I heard about KuduContext
. I tried to install it, but surprise. KuduContext works only with Scala v2.11
and Apache Spark v2.4.6
(nothing about Play).
I uninstalled spark v3, download, install and set environments for Spark v2.4.6 again. Created new project with these configurations
Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
I found something incompatible with Play and downgrade it to 2.7. Later, I found some incompatibilities with Jackson module.
java.lang.ExceptionInInitializerError
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.10-1
Required to install "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.6.5"
. Now, when I start the project, when use SparkContext I gen another error
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
...
finally, my build.sbt became:
Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
jackson-module-scala v2.6.5
Some code:
object SparkContext {
val spark = SparkSession
.builder
.appName("SparkApp")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.getOrCreate()
val context = spark.sparkContext
}
SparkContext using here:
val df = SparkContext.spark.read.csv(filePath) // here the error ocurred
val lines = df.take(1001).map(mapper)
Its so heavy to take care about compatibilities with another libraries when create a new library version in this ecosystem? I found a lot of posts created about versions incompatibilities, but not a solution. What I miss here? thanks