24

I am using spark-summit command for executing Spark jobs with parameters such as:

spark-submit --master yarn-cluster --driver-cores 2 \
 --driver-memory 2G --num-executors 10 \
 --executor-cores 5 --executor-memory 2G \
 --class com.spark.sql.jdbc.SparkDFtoOracle2 \
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
Arvind Kumar
  • 1,325
  • 1
  • 19
  • 27

2 Answers2

26

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.

This requires spark.shuffle.service.enabled to be set to true, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.

The following configurations are also relevant:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

These options can be configured to Spark application in 3 ways

1. From Spark submit with --conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster \
    --driver-cores 2 \
    --driver-memory 2G \
    --num-executors 10 \
    --executor-cores 5 \
    --executor-memory 2G \
    --conf spark.dynamicAllocation.minExecutors=5 \
    --conf spark.dynamicAllocation.maxExecutors=30 \
    --conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 \
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2. Inside Spark program with SparkConf

Set the properties in SparkConf then create SparkSession or SparkContext with it

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3. spark-defaults.conf usually located in $SPARK_HOME/conf/

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.

Spark - Dynamic Allocation Confs

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
  • In my cluster spark.dynamicAllocation.enabled is true. Do i need to do any thing programmatic ally to use this feature. what should be the Spark-submit command ? – Arvind Kumar Oct 23 '16 at 07:34
  • 1
    No need. specifying maximum executors will be fine as mentioned any of above places. – mrsrinivas Nov 15 '16 at 04:02
  • @mrsrinivas it is told now with spark 2.x Dynamic allocation no longer needed/supported ? Can you please advice? – BdEngineer Dec 03 '18 at 08:35
  • 1
    @user3252097: I don't find anything like that, can you share me the reference? – mrsrinivas Dec 04 '18 at 05:54
  • 3
    why you did you set num-executors=5 ? you have already set spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors executor via conf. – Beyhan Gul May 14 '19 at 22:44
  • not both, any one is fine. `num-executors` in shell == `spark.dynamicAllocation.initialExecutors` via conf – mrsrinivas May 15 '19 at 08:23
  • @mrsrinivas what if I only set spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors? Do I also need to set spark.dynamicAllocation.enabled to True? – Renato Bibiano Jun 17 '20 at 18:04
  • 5
    yes, `spark.dynamicAllocation.{min/max}Executors`is relavent only if `spark.dynamicAllocation.enabled=true`. which means those configurations will be ignored. – mrsrinivas Jun 18 '20 at 02:32
0

I just did a small demo with Spark's dynamic resource allocation. The code is on my Github. Specifically, the demo is in this release.

AvinashK
  • 3,309
  • 8
  • 43
  • 94