I am trying to build an application which submits spark jobs remotely and monitors for the status of the submitted job. I found http://arturmkrtchyan.com/apache-spark-hidden-rest-api which describes a REST API to submit jobs and fetch status. However when I try and submit a job , it submits successfully returning a submission ID as well but throws NullPointer Exception when fetching the jar from a remote hosted URL.
Request:
curl -X POST http://<sparkmaster-url>:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "prod", 960 ],
"appResource" : "http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar",
"clientSparkVersion" : "2.0.2",
"mainClass" : "myclass",
"sparkProperties" : {
"spark.jars" : "http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar",
"spark.app.name" : "myclass",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://<spark_master_ip>:6066"
}
}'
The error traceback that I receive on one of the executors is this:
2017-03-30 13:25:37,495 INFO [Worker] - Asked to launch driver driver-20170330132537-0040
2017-03-30 13:25:37,504 INFO [DriverRunner] - Copying user jar http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330132537-0040/myapp-assembly-1.0.jar
2017-03-30 13:25:37,850 INFO [Utils] - Fetching http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330132537-0040/fetchFileTemp1748077866083410926.tmp
2017-03-30 13:26:00,821 WARN [Worker] - Driver driver-20170330132537-0040 failed with unrecoverable exception: java.lang.NullPointerException
However, this particular job gets successfully executed when submitting via spark-submit command. Output logs on when of the executors:
2017-03-30 15:17:02,884 INFO [Worker] - Asked to launch driver driver-20170330151702-0054
2017-03-30 15:17:02,893 INFO [DriverRunner] - Copying user jar http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330151702-0054/myapp-assembly-1.0.jar
2017-03-30 15:17:03,243 INFO [Utils] - Fetching http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar to /home/ubuntu/spark/work/driver-20170330151702-0054/fetchFileTemp4438219504638828769.tmp
2017-03-30 15:17:22,266 INFO [DriverRunner] - Launch Command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" "/home/ubuntu/spark-statsd/target/scala-2.11/spark-statsd-1.0.0.jar:/home/ubuntu/spark/conf/:/home/ubuntu/spark/jars/*" "-Xmx4096M" "-Dspark.kryo.registrator=serializer.CerebroKryoSerializer" "-Dspark.submit.deployMode=cluster" "-Dspark.executor.memory=15G" "-Dspark.shuffle.consolidateFiles=true" "-Dspark.driver.extraClassPath=/home/ubuntu/spark-statsd/target/scala-2.11/spark-statsd-1.0.0.jar" "-Dspark.app.name=myclass" "-Dspark.executor.instances=16" "-Dspark.master=spark://<spark_master_ip>:7077" "-Dspark.executor.cores=4" "-Dspark.serializer=org.apache.spark.serializer.KryoSerializer" "-Dspark.jars=http://<jenkins_url>/job/myjob/ws/target/scala-2.11/myapp-assembly-1.0.jar" "-Dspark.driver.supervise=false" "-Dspark.shuffle.file.buffer=400" "-Dspark.kryoserializer.buffer=256" "-Dspark.driver.memory=4G" "-Dspark.default.parallelism=48" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@<spark_worker>:39704" "/home/ubuntu/spark/work/driver-20170330151700-0053/myapp-assembly-1.0.jar" "myclass" "prod" "960"
2017-03-30 15:17:24,047 INFO [Worker] - Asked to launch executor app-20170330151724-0028/1 for myclass$
Can you help me figure out if I am missing something here? Thanks for all your help in advance!