1

What is the difference between Spark-submit "--master" defined in the CLI and spark application code, defining the master?

In Spark we can specify the master URI in either the application code like below:

Spark Master configured in application code

Or we can specify the master URI in the spark-submit as an argument to a parameter, like below:

Spark-submit master option

Does one take precendence over the other? Do they have to agree contractually, so I have two instances of the same URI referenced in the program spark-submit and the spark application code, creating the SparkSession? Will one override the other? What will the SparkSession do differently with the master argument, and what will the spark-submit master parameter do differently?

Any help would be greatly appreciated. Thank you!

Joyoyoyoyoyo
  • 165
  • 1
  • 3
  • 18

2 Answers2

3

To quote the official documentation

The spark-submit script can load default Spark configuration values from a properties file and pass them on to your application. By default, it will read options from conf/spark-defaults.conf in the Spark directory. For more detail, see the section on loading default configurations.

Loading default Spark configurations this way can obviate the need for certain flags to spark-submit. For instance, if the spark.master property is set, you can safely omit the --master flag from spark-submit. In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.

If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.

So all are valid options, and there is a well defined hierarchy which defines precedence if the same option is set in multiple place. From highest to lowest:

  • Explicit settings in the application.
  • Commandline arguments.
  • Options from the configuration files.
0

From the Spark documentation:

In general,

  • configuration values explicitly set on a SparkConf take the highest precedence,
  • then flags passed to spark-submit,
  • then values in the defaults file.

It strikes me the most flexible approach is flags passed to spark-submit.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83