0

What I intend to achieve is having a Scala Spark program (in a jar) receive a POST message from a client e.g. curl, take some argument values, do some Spark processing and then return a result value to the calling client. From the Apache Livy documentation available I cannot find a way how I can invoke a compiled and packaged Spark program from a client (e.g. curl) via Livy in an interactive i.e. session mode. Such a request/reply scenario via Livy can be done with Scala code passed in plain text to the Spark shell. But how can I do it with a Scala class in a packaged jar?

curl -k --user "admin:mypassword" -v \
-H "Content-Type: application/json" -X POST \
-d @Curl-KindSpark_ScalaCode01.json \
"https://myHDI-Spark-Clustername.azurehdinsight.net/livy/sessions/0/statements" \
-H "X-Requested-By: admin"

Instead of Scala source code as data (-d @Curl-KindSpark_ScalaCode01.json) I would rather pass the path and filename of the jar-file and a ClassName and Argument values. But how?

Romeo Ninov
  • 6,538
  • 1
  • 22
  • 31
Gerold
  • 1
  • 2
  • I am also trying this, but my JAR file is on Local hard drive, I am using Livy locally . Can I invoke the JAR using session? I can run the JOB using Batches , but I want to do it using sessions.Can you help me with this ? – Abhishek Sengupta Sep 15 '20 at 19:42

2 Answers2

1
  • Make a uber jar of your Spark app with sbt-assemby plugin.

  • Upload jar file from the previous step to your HDFS cluster: hdfs dfs -put /home/hduser/PiNumber.jar /user/hduser

  • Execute your job via livy: curl -X POST -d '{"conf": {"kind": "spark" , "jars": "hdfs://localhost:8020/user/hduser/PiNumber.jar"}}' -H "Content-Type: application/json" -H "X-Requested-By: user" localhost:8998/sessions

  • check it: curl localhost/sessions/0/statements/3:

{"id":3,"state":"available","output":{"status":"ok","execution_count":3,"data":{"text/plain":"Pi is roughly 3.14256"}}}

p.s.

Spark Livy API for Scala/Java requires using an uber jar file. sbt-assembly doesn't make fat jar instantly, it annoys me. Usually, I use Python API of Livy for smoke tests and tweaking.

Sanity checks with Python:

  • curl localhost:sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"Sanity check for Livy\")"}'

You can put more complicated logic to field code. BTW, it's a way in which popular notebooks for Spark works - sending the source code to cluster via Livy.

0

Thx, I will try this out. In the meanwhile I found another solution: $ curl -k --user "admin:" -v -H "Content-Type: application/json" -X POST -d @Curl-KindSpark_BrandSampleModel_SessionSetup.json "https://mycluster.azurehdinsight.net/livy/sessions with a JSON file containing { "kind": "spark", "jars": ["adl://skylytics1.azuredatalakestore.net/skylytics11/azuresparklivy_2.11-0.1.jar"] } and with the uploaded jar in the Azure Data Lake Gen1 account containing the Scala object and then post the statement $ curl -k --user "admin:myPassword" -v -H "Content-Type: application/json" -X POST -d @Curl-KindSpark_BrandSampleModel_CodeSubmit.json "https://mycluster.azurehdinsight.net/livy/sessions/4/statements" -H "X-Requested-By: admin" with the content { "code": "import AzureSparkLivy_GL01._; val brandModelSamplesFirstModel = AzureSparkLivyFunction.SampleModelOfBrand(sc, \"Honda\"); brandModelSamplesFirstModel" }.

So I told Livy to start an interactive Spark session and load the specified jar and passed some code to invoke a member of the object in the jar. It works. Will check your advice too.

Gerold
  • 1
  • 2