I have sbt.build
like that, to do Spark programming :
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.1" withSources(),
"com.datastax.spark" %% "spark-cassandra-connector" % "3.0.0" withSources()
...
)
As my program use other libraries than Spark itself, I have to use sbt assembly
to generate an uber Jar, that I can use as an argument to spark-submit
, in order to run such spark application in my spark standalone
cluster.
The resulting uber-jar
output works like a charm.
However the compilation took a lot of time, and I find such method too slow to iterate in my development.
I mean, at every Spark application code change, that I want to test, i have to run another compilation that output the uber-jar
with sbt, and each time, it takes very long (at least 5 minutes) to be done, before I can run it on my cluster.
I know that I may optimize a bit the build.sbt
to faster a bit the compilation. But I think it will remain slow.
So, my question is, if you know that there are other methods that completely avoid to build an uber-jar
?
Ideally I think about a method, that I just have to trigger sbt package
(lot faster than sbt assembly
), and where I then just can tell at the spark-submit
level or at the spark standalone
cluster level, which additionnal jars to load.
However for instance, the spark-submit
seems clear about that..
application-jar : Path to a bundled jar including your application and all dependencies
.. so may be I have no other choice ..
Any pointers to speed up my Spark development with Scala, SBT, and additional librairies ?