I want to run different jobs on demand with same spark context, but i don't know how exactly i can do this.
I try to get current context, but seems it create a new spark context(with new executors).
I call spark-submit to add new jobs.
I run code on Amazon EMR, with yarn as resource manager.
My code:
val sparkContext = SparkContext.getOrCreate()
val content = 1 to 40000
val result = sparkContext.parallelize(content, 5)
result.map(value => value.toString).foreach(loop)
def loop(x: String): Unit = {
for (a <- 1 to 30000000) {
}
}
spark-submit:
spark-submit --executor-cores 1 \
--executor-memory 1g \
--driver-memory 1g \
--master yarn \
--deploy-mode cluster \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.minExecutors=1 \
--conf spark.dynamicAllocation.maxExecutors=3 \
--conf spark.dynamicAllocation.initialExecutors=3 \
--conf spark.executor.instances=3 \
If i run twice spark-submit it create 6 executors, but i want to run all this jobs on same spark application.
How can achieve adding jobs to an existing spark application?
I read about JobServer(https://github.com/spark-jobserver/spark-jobserver) which achieved what i want to do, but i don't understand how they do this.