8

When using EMR (with Spark, Zeppelin), changing spark.driver.memory in Zeppelin Spark interpreter settings won't work.

I wonder what is the best and quickest way to set Spark driver memory when using EMR web interface (not aws CLI) to create clusters?

Is Bootstrap action could be a solution? If yes, can you please provide an example of how the bootstrap action file should look like?

Rami
  • 8,044
  • 18
  • 66
  • 108

1 Answers1

9

You can always try to add the following configuration on job flow/cluster creation :

[
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.driver.memory": "12G"
        }
    }
]

You can do this most of the configurations whether for spark-default, hadoop core-site, etc.

I hope this helps !

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • Unfortunately that didn't work. I think it should be a spark configuration rather than a zeppelin one? – Rami Nov 28 '17 at 13:50
  • Ok let me update my answer so you can set it in spark conf – eliasah Nov 28 '17 at 13:51
  • 1
    Glad it helped @Rami – eliasah Nov 28 '17 at 14:07
  • sure it helped! As always ;) – Rami Nov 28 '17 at 14:10
  • well, it is a zeppelin configuration after all, because zeppelin is the one that calls spark-submit, but if not configured otherwise, it will use the spark defaults arguments for the spark-submit command. So, rather than setting up zeppelin, this answer solves it by modifying the spark-defaults themselves. Great answer! – Radu Simionescu Jan 09 '18 at 18:01
  • To set the spark defaults using the console, you can use the instructions here (you'll have to scroll a little) https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html – Dominic Dao May 12 '23 at 20:07