1

I want to tune my spark cluster on AWS EMR and I couldn't change the default value of spark.driver.memory which leads every spark application to crash as my dataset is big.

I tried editing the spark-defaults.conf file manually on the master machine, and I also tried configuring it directly with a JSON file on EMR dashboard while creating the cluster.

Here's the JSON file used:

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.driver.memory": "7g",
      "spark.driver.cores": "5",
      "spark.executor.memory": "7g",
      "spark.executor.cores": "5",
      "spark.executor.instances": "11"
      }
  }
]

After using the JSON file, the configurations are correctly found in the "spark-defaults.conf" but on spark dashboard there's always the default value for "spark.driver.memory" of 1000M while the other values are modified correctly. Anyone have got into the same problem please? Thank you in advance.

asr9
  • 2,440
  • 1
  • 21
  • 37
yassidhbi
  • 11
  • 1

1 Answers1

0

You need to set

maximizeResourceAllocation=true

in the spark-defaults settings

[
   {
    "Classification": "spark",
    "Properties": {
       "maximizeResourceAllocation": "true"
    }
  }
]
Vishnu
  • 725
  • 4
  • 11
  • 26
  • I actually used that before trying to use manual values but the spark.driver.memory value didn't change at all.. – yassidhbi Apr 12 '19 at 08:18