I have a Standalone spark cluster in HA mode (2 masters) and couple of workers registered there.
I submitted the spark job via REST interface with following details,
{
"sparkProperties": {
"spark.app.name": "TeraGen3",
"spark.default.parallelism": "40",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.task.maxFailures": "3",
"spark.jars": "file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar",
"spark.eventLog.enabled": "false",
"spark.submit.deployMode": "cluster",
"spark.driver.supervise": "true",
"spark.master": "spark://spark-hn0:7077,spark-hn1:7077"
},
"mainClass": "com.github.ehiggs.spark.terasort.TeraGen",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": ["4g", "file:///tmp/data/teradata4g/"],
"appResource": "file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar",
"clientSparkVersion": "2.1.1"
}
This request is submitted to the Active Spark Master via REST interface(http://spark-hn1:6066/v1/submissions/create).
When the driver got launched, -Dspark.master is set to "spark://spark-hn1:7077" instead of the value passed in sparkProperties, which is "spark://spark-hn0:7077,spark-hn1:7077".
Logs from the worker node where driver is running
17/12/18 13:29:49 INFO worker.DriverRunner: Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-Dhdp.version=2.6.99.200-0" "-cp" "/usr/hdp/current/spark2-client/conf/:/usr/hdp/current/spark2-client/jars/*:/etc/hadoop/conf/" "-Xmx512M" "-Dspark.driver.memory=51
2m" "-Dspark.master=spark://spark-hn1:7077" "-Dspark.executor.memory=512m" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=TeraGen3" "-Dspark.default.parallelism=40" "-Dspark.jars=file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "-Dspark.ta
sk.maxFailures=3" "-Dspark.driver.supervise=true" "-Dspark.eventLog.enabled=false" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.18.0.4:40803" "/var/spark/work/driver-20171218132949-0001/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "com.git
hub.ehiggs.spark.terasort.TeraGen" "4g" "file:///tmp/data/teradata4g/"
This is causing problem to me when the active master goes down during the job execution and the other master become active. Since the driver knows only one master (old one) it is not able to reach new master and continue the job execution (since spark.driver.supervise=true)
What is the right way of passing the multiple master urls in Spark REST interface.