0

Just started experimenting with the JobServer and would like to use it in our production environment.

We usually run spark jobs individually in yarn-client mode and would like to shift towards the paradigm offered by the Ooyala Spark JobServer.

I am able to run the WordCount examples shown in the official page. I tried running submitting our custom spark job to the Spark JobServer and I got this error:

{
 "status": "ERROR",
 "result": {
   "message": "null",
  "errorClass": "scala.MatchError",
  "stack": ["spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:220)",
   "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)",
    "scala.concurrent.impl.Future   $PromiseCompletingRunnable.run(Future.scala:24)", 
    "akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)", 
    "akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)",
    "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)",
        "scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java 1339)",
    "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", 
    "scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"]
}

I had made the necessary code modifications like extending SparkJob and implementing the runJob() method.

This is the dev.conf file that I used:

# Spark Cluster / Job Server configuration
spark {
  # spark.master will be passed to each job's JobContext
     master = "yarn-client"

  # Default # of CPUs for jobs to use for Spark standalone cluster
    job-number-cpus = 4

    jobserver {
      port = 8090
      jar-store-rootdir = /tmp/jobserver/jars
      jobdao = spark.jobserver.io.JobFileDAO
      filedao {
        rootdir = /tmp/spark-job-server/filedao/data
      }

     context-creation-timeout = "60 s"
    }

  contexts {
    my-low-latency-context {
    num-cpu-cores = 1                 
    memory-per-node = 512m        
   }
  }

  context-settings {
    num-cpu-cores = 2         
    memory-per-node = 512m        
  }

  home = "/data/softwares/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041"
}

spray.can.server {
    parsing.max-content-length = 200m
}

spark.driver.allowMultipleContexts = true
YARN_CONF_DIR=/home/spark/conf/

Also how can I give run-time parameters for the spark job, such as --files, --jars ? For example, I usually run our custom spark job like this:

./spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/bin/spark-submit --class com.demo.SparkDriver --master yarn-cluster --num-executors 3 --jars /tmp/api/myUtil.jar --files /tmp/myConfFile.conf,/tmp/mySchema.txt /tmp/mySparkJob.jar 
James Isaac
  • 2,587
  • 6
  • 20
  • 30

1 Answers1

0

Number of executors and extra jars are passed in a different way, through the config file (see dependent-jar-uris config setting).

YARN_CONF_DIR should be set in the environment and not in the .conf file.

As for other issues, the google group is the right place to ask. You may want to search it for yarn-client issues, as several other folks have figured out how to get it to work.

Evan Chan
  • 96
  • 1
  • Thanks a lot for your answer. Is there any spark-job-server specific config setting for passing files(such as config files, schema files) at runtime? I understand dependent-jar-uris property is used for passing additional jar files – James Isaac Apr 14 '15 at 13:33
  • @JamesIsaac not right now, but that's an interesting suggestion. Want to file an issue? – Evan Chan Apr 17 '15 at 14:18