2

I am trying to build an application which submits spark jobs remotely and monitors for the status of the submitted job. I found http://arturmkrtchyan.com/apache-spark-hidden-rest-api which describes a REST API to submit jobs and fetch status. However when I try and submit a job , it submits successfully returning a submission ID as well but throws NullPointer Exception when fetching the jar from a remote hosted URL.

Request:

    curl -X POST http://<sparkmaster-url>:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
        "action" : "CreateSubmissionRequest",
        "appArgs" : [ "prod", 960 ],
        "appResource" : "http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar",
        "clientSparkVersion" : "2.0.2",
        "mainClass" : "myclass",
        "sparkProperties" : {
          "spark.jars" : "http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar",
          "spark.app.name" : "myclass",
          "spark.submit.deployMode" : "cluster",
          "spark.master" : "spark://<spark_master_ip>:6066"
        }
      }'

The error traceback that I receive on one of the executors is this:

    2017-03-30 13:25:37,495 INFO  [Worker] - Asked to launch driver driver-20170330132537-0040
    2017-03-30 13:25:37,504 INFO  [DriverRunner] - Copying user jar http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330132537-0040/myapp-assembly-1.0.jar 
    2017-03-30 13:25:37,850 INFO  [Utils] - Fetching http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330132537-0040/fetchFileTemp1748077866083410926.tmp
    2017-03-30 13:26:00,821 WARN  [Worker] - Driver driver-20170330132537-0040 failed with unrecoverable exception: java.lang.NullPointerException

However, this particular job gets successfully executed when submitting via spark-submit command. Output logs on when of the executors:

    2017-03-30 15:17:02,884 INFO  [Worker] - Asked to launch driver driver-20170330151702-0054
    2017-03-30 15:17:02,893 INFO  [DriverRunner] - Copying user jar  http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330151702-0054/myapp-assembly-1.0.jar
    2017-03-30 15:17:03,243 INFO  [Utils] - Fetching http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330151702-0054/fetchFileTemp4438219504638828769.tmp
    2017-03-30 15:17:22,266 INFO  [DriverRunner] - Launch Command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" "/home/ubuntu/spark-statsd/target/scala-2.11/spark-statsd-1.0.0.jar:/home/ubuntu/spark/conf/:/home/ubuntu/spark/jars/*" "-Xmx4096M" "-Dspark.kryo.registrator=serializer.CerebroKryoSerializer" "-Dspark.submit.deployMode=cluster" "-Dspark.executor.memory=15G" "-Dspark.shuffle.consolidateFiles=true" "-Dspark.driver.extraClassPath=/home/ubuntu/spark-statsd/target/scala-2.11/spark-statsd-1.0.0.jar" "-Dspark.app.name=myclass" "-Dspark.executor.instances=16" "-Dspark.master=spark://<spark_master_ip>:7077" "-Dspark.executor.cores=4" "-Dspark.serializer=org.apache.spark.serializer.KryoSerializer" "-Dspark.jars=http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar" "-Dspark.driver.supervise=false" "-Dspark.shuffle.file.buffer=400" "-Dspark.kryoserializer.buffer=256" "-Dspark.driver.memory=4G" "-Dspark.default.parallelism=48" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@<spark_worker>:39704" "/home/ubuntu/spark/work/driver-20170330151700-0053/myapp-assembly-1.0.jar" "myclass" "prod" "960"
    2017-03-30 15:17:24,047 INFO  [Worker] - Asked to launch executor app-20170330151724-0028/1 for myclass$

Can you help me figure out if I am missing something here? Thanks for all your help in advance!

mkl
  • 90,588
  • 15
  • 125
  • 265
user_777
  • 21
  • 3

0 Answers0