2

I'd like to execute a Spark job, via an HTTP call from outside the cluster using Livy, where the Spark jar already exists in HDFS.

I'm able to spark-submit the job from shell on the cluster nodes, e.g.:

spark-submit --class io.woolford.Main --master yarn-cluster hdfs://hadoop01:8020/path/to/spark-job.jar

Note that the --master yarn-cluster is necessary to access HDFS where the jar resides.

I'm also able to submit commands, via Livy, using curl. For example, this request:

curl -X POST --data '{"file": "/path/to/spark-job.jar", "className": "io.woolford.Main"}' -H "Content-Type: application/json" hadoop01:8998/batches

... executes the following command on the cluster:

spark-submit --class io.woolford.Main hdfs://hadoop01:8020/path/to/spark-job.jar

This is the same as the command that works, minus the --master yarn-cluster params. This was verified by tailing /var/log/livy/livy-livy-server.out.

So, I just need to modify the curl command to include --master yarn-cluster when it's executed by Livy. At first glance, it seems like this should be possible by adding arguments to the JSON dictionary. Unfortunately, these aren't passed through.

Does anyone know how to pass --master yarn-cluster to Livy so that jobs are executed on YARN without making systemwide changes?

Alex Woolford
  • 4,433
  • 11
  • 47
  • 80

2 Answers2

0

I recently tried something similar as your question. I need to send a HTTP request to Livy's API, while Livy is already installed in a cluster (YARN), and then I want to let Livy start a Spark job.

My command to call Livy did not include --master yarn-cluster, but that seems to work for me. Maybe you can try to put your JAR file in local in stead of in a cluster?

g00glen00b
  • 41,995
  • 13
  • 95
  • 133
john zhang
  • 41
  • 2
0

spark.master = yarn-cluster

set it in the spark conf, for me:/etc/spark2/conf/spark-defaults.conf

zyfo2
  • 302
  • 2
  • 6