0

I'm submitting spark jobs on a livy (0.6.0) session through Curl

The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851

Actually when running this code using this curl command :

curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/

When it comes to the code it is exactly like the answer shown above :

package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._

object Test extends Job[Boolean]{
  override def call(jc: JobContext): Boolean = {
  val sc = jc.sc
  sc.getConf.getAll.foreach(println)
  return true
}

As for the error it is a java Nullpointer exception as shown below

Exception in thread "main" java.lang.NullPointerException
    at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
    at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
    at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
    at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

as the output excepted is to start running the job in the jar

2 Answers2

2

I've used livy REST apis and with respect to that there are 2 approaches to submit spark job. Please refer rest api docs, you will get fair understanding livy rest requests.:

1. Batch (/batches) :
You submit request, you get job id. Based on job id you poll for status of spark job. Here you have option to execute uber jar as well as code file but I've never used latter

2. Session (/sessions and /sessions/{sessionId}/statements):
You submit spark job as code, no need to create uber jar. Here, you first create a Session and in this session you execute Statement/s (actual code)

For both the approaches, if you check documentation it has nice explanation about corresponding rest requests and request body/parameters.

Examples/Samples are here, here

Correction to your code, would be:

Batch

curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "file": "absolute-path-to-your-application-jar",
    "className": "fully-qualified-spark-class-name",
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/batches/

Session and Statement

// Create a session
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/

// Run code/statement in session created above
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "code": "spark-code"
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/{sessionId}/statements
kode
  • 384
  • 1
  • 2
  • what about sending a job as the example above i can't be sending code each time through curl , that's why i'm having a jar the question was how to get that jar to work – Ahmed Adnane A'mil Sep 02 '19 at 13:12
  • @AhmedAdnaneA'mil: I think it is pretty clear from approaches I have mentioned. If you want to go uber jar way, refer to Batch approach and curl example for same is also provided. – kode Sep 02 '19 at 13:48
  • your comment was pretty informative but it doesn't answer the question asked the problem consits even with the batch command , the problem i think is with the jar file , any idea on how i should be compiling my jar / writing my code – Ahmed Adnane A'mil Sep 03 '19 at 09:55
  • Can I use an s3 path to the JAR file? – yegeniy Feb 19 '20 at 05:42
  • @yegeniy yes, you can use S3 path to JAR file. – kode Feb 21 '20 at 05:26
  • 1
    Thank you @Kode. It turned that my issue was actually with [LIVY-636]. I needed to omit scala itself from the jar’s assembly. – yegeniy Feb 21 '20 at 14:15
0

As @yegeniy mentioned above the problem came from LIVY-636 you will need to the build the Jar without Scala libraries and everything will work smoothly