1

I want to run different jobs on demand with same spark context, but i don't know how exactly i can do this.

I try to get current context, but seems it create a new spark context(with new executors).

I call spark-submit to add new jobs.

I run code on Amazon EMR, with yarn as resource manager.

My code:

val sparkContext = SparkContext.getOrCreate()
val content = 1 to 40000
val result = sparkContext.parallelize(content, 5)
result.map(value => value.toString).foreach(loop)

def loop(x: String): Unit = {
   for (a <- 1 to 30000000) {

   }
}

spark-submit:

spark-submit --executor-cores 1 \
             --executor-memory 1g \
             --driver-memory 1g \
             --master yarn \
             --deploy-mode cluster \
             --conf spark.dynamicAllocation.enabled=true \
             --conf spark.shuffle.service.enabled=true \
             --conf spark.dynamicAllocation.minExecutors=1 \
             --conf spark.dynamicAllocation.maxExecutors=3 \
             --conf spark.dynamicAllocation.initialExecutors=3 \
             --conf spark.executor.instances=3 \

If i run twice spark-submit it create 6 executors, but i want to run all this jobs on same spark application.

How can achieve adding jobs to an existing spark application?

I read about JobServer(https://github.com/spark-jobserver/spark-jobserver) which achieved what i want to do, but i don't understand how they do this.

Cosmin
  • 676
  • 7
  • 15

1 Answers1

0

Spark JobServer is doing it programatically using Spark Context API. See here https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/src/main/scala/spark/jobserver/JobManagerActor.scala#L288

noorul
  • 1,283
  • 1
  • 8
  • 18