I would would like to add some configuration when a Spark Job is submitted via Apache Livy into an Azure cluster. Currently to launch a spark Job via Apache Livy in the cluster, I use the following command
curl -X POST --data '{"file": "/home/xxx/lib/MyJar.jar", "className": "org.springframework.boot.loader.JarLauncher"}' -H "Content-Type: application/json" localhost:8998/batches
This command generate the following process
……. org.apache.spark.deploy.SparkSubmit --conf spark.master=yarn-cluster --conf spark.yarn.tags=livy-batch-51-qHXmHXWg --conf spark.yarn.submit.waitAppCompletion=false --class org.springframework.boot.loader.JarLauncher adl://home/home/xxx/lib/MyJar.jar
Due to a technical issue when running the jar, Ineed to introduce two configurations into this command.
--conf "spark.driver.extraClassPath=/home/xxx/lib /jars/*"
--conf "spark.executor.extraClassPath=/home/xxx/lib/jars/*"
It's related to a logback issue when running on spark which use log4j2. the extra class path adds logback jars
I found here https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/fcRM3YiqAAA that it can be done by adding this conf to LIVY_SERVER_JAVA_OPTS or spark-defaults.conf
From Ambari I modified LIVY_SERVER_JAVA_OPTS in livy-env.sh (in spak2 & livy menu) and Advanced spark2-defaults in Spark2.
Unfortunately this is not working on our side. Even I can see that the LivyServer is launched with -Dspark.driver.extraClassPath
Is there any specific configuration to add in Azure Hdinsight to make it working?
Note that the process should be like
……. org.apache.spark.deploy.SparkSubmit --conf spark.master=yarn-cluster --conf spark.yarn.tags=livy-batch-51-qHXmHXWg --conf spark.yarn.submit.waitAppCompletion=false **--conf "spark.driver.extraClassPath=/home/xxx/lib /jars/*" --conf "spark.executor.extraClassPath=/home/xxx/lib/jars/*"**
--class org.springframework.boot.loader.JarLauncher adl://home/home/xxx/lib/MyJar.jar
Thx