How to specify which java version to use in spark-submit command?

Question

I want to run a spark streaming application on a yarn cluster on a remote server. The default java version is 1.7 but i want to use 1.8 for my application which is also there in the server but is not the default. Is there a way to specify through spark-submit the location of java 1.8 so that i do not get major.minor error ?

you use maven? If so you can specify the java version in the pom.xml — M. Suurland, Apr 26 '16 at 11:30
maybe you can set JAVA_HOME just before you spark-submit. like this: "JAVA_HOME=/path/to/java ./bin/spark-submit ......" — Hlib, Apr 26 '16 at 11:40
setting JAVA_HOME before the spark-submit command worked for me. Thanks :) — Priyanka, Apr 26 '16 at 12:43
@Hlib , doing so changed the java version for the current application for the driver and not the executors in the cluster which also have their default java version as 1.7. Can you suggest a workaround for that as well ? — Priyanka, Apr 27 '16 at 05:40
did you try to specify JAVA_HOME in $SPARK_HOME$/conf/spark-env.sh? — Hlib, Apr 27 '16 at 08:21
or it is better to put it here: $HADOOP_HOME$/etc/hadoop/yarn-env.sh — Hlib, Apr 27 '16 at 08:31
But that would affect other applications running in the same cluster. So i changed my code to run with Java 7. Thanks :) — Priyanka, Apr 27 '16 at 09:21

score 16 · Answer 1 · answered Feb 03 '17 at 14:58

16

JAVA_HOME was not enough in our case, the driver was running in java 8, but I discovered later that Spark workers in YARN were launched using java 7 (hadoop nodes have both java version installed).

I had to add spark.executorEnv.JAVA_HOME=/usr/java/<version available in workers> in spark-defaults.conf. Note that you can provide it in command line with --conf.

See http://spark.apache.org/docs/latest/configuration.html#runtime-environment

answered Feb 03 '17 at 14:58

mathieu

2,330
2
24
44

2

For those who don't have access / permission to check java version on worker nodes, use `spark.range(0, 100).mapPartitions(_.map(_ => java.lang.System.getProperty("java.version"))).show` for sanity check. It might be too hard to determine runtime java version via yarn / spark UI – shay__ Jan 02 '18 at 12:52
1

Both _spark.executorEnv.JAVA_HOME_ and _spark.yarn.appMasterEnv.JAVA_HOME_ need to be set. – Avinash Ganta Nov 15 '19 at 09:55

Radu · Answer 2 · 2016-10-28T13:32:01.577

4

Although you can force the Driver code to run on a particular Java version (export JAVA_HOME=/path/to/jre/ && spark-submit ... ), the workers will execute the code with the default Java version from the yarn user's PATH from the worker machine.

What you can do is set each Spark instance to use a particular JAVA_HOME by editing the spark-env.sh files (documentation).

edited Oct 28 '16 at 13:32

answered Oct 28 '16 at 06:43

Radu

2,022
3
17
28

score 2 · Answer 3 · answered Aug 29 '18 at 17:03

2

If you want to set java environment for spark on yarn, you can set it before spark-submit

--conf spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8.0_121 \

answered Aug 29 '18 at 17:03

Masterbuilder

499
2
12
24

score 1 · Answer 4 · answered Mar 15 '18 at 17:24

1

Add JAVA_HOME that you want in spark-env.sh (sudo find -name spark-env.sh ...ej. : /etc/spark2/conf.cloudera.spark2_on_yarn/spark-env.sh)

answered Mar 15 '18 at 17:24

Carlos Gomez

200
1
12

score 1 · Answer 5 · answered Nov 15 '19 at 09:52

The Java version would need to be set for both the Spark App Master and the Spark Executors which will be launched on YARN. Thus the spark-submit command must include two JAVA_HOME settings: spark.executorEnv.JAVA_HOME and spark.yarn.appMasterEnv.JAVA_HOME

spark-submit --class com.example.DataFrameExample --conf "spark.executorEnv.JAVA_HOME=/jdk/jdk1.8.0_162" --conf "spark.yarn.appMasterEnv.JAVA_HOME=/jdk/jdk1.8.0_162" --master yarn --deploy-mode client /spark/programs/DataFrameExample/target/scala-2.12/dfexample_2.12-1.0.jar

How to specify which java version to use in spark-submit command?

5 Answers5

Linked