1

Using Amazon emr-5.30.1 with Livy 0.7 and Spark 2.4.5

We are willing to use Apache Livy as a REST Service for spark. The mode we want to work with is session and not batch. Trying to upload a jar to the session (by the formal API) using:

curl -X POST \
     -d '{"conf": {"kind" : "spark","jars": "s3://cjspro-emr-data/spark-examples.jar"}}' \
     -H "Content-Type: application/json" localhost:8998/sessions

Looking at the session logs gives the impression that the jar is not being uploaded. Not to mention that code snippets that are using the requested jar not working.

Any help?

Hakan Dilek
  • 2,178
  • 2
  • 23
  • 35

1 Answers1

0

I am not sure if the jar reference from s3 will work or not but we did the same using bootstrap actions and updating the spark config.

Step 1: Create a bootstrap script and add the following code;

aws s3 cp s3://cjspro-emr-data/spark-examples.jar /home/hadoop/jars/

Step 2: While creating Livy session, set the following spark config using the conf key in Livy sessions API

'conf':{'spark.driver.extraClassPath':'/home/hadoop/jars/*,
        'spark.executor.extraClassPath':'/home/hadoop/jars/*'}

Step 3: Send the jars to be added to the session using the jars key in Livy session API.

'jars':['local:/home/hadoop/spark-examples.jar']

So the final data to create a Livy session would look like;

{
'kind':'pyspark',
'conf':'above mentioned dict',
'jars':['local:/home/hadoop/spark-examples.jar'],
'executorCores':'',
'executorMemory':'',
.
.
.
}
satish silveri
  • 358
  • 3
  • 17