How to set spark.driver.memory for Spark/Zeppelin on EMR

Question

When using EMR (with Spark, Zeppelin), changing spark.driver.memory in Zeppelin Spark interpreter settings won't work.

I wonder what is the best and quickest way to set Spark driver memory when using EMR web interface (not aws CLI) to create clusters?

Is Bootstrap action could be a solution? If yes, can you please provide an example of how the bootstrap action file should look like?

eliasah · Accepted Answer · 2017-11-28T13:52:48.810

9

You can always try to add the following configuration on job flow/cluster creation :

[
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.driver.memory": "12G"
        }
    }
]

You can do this most of the configurations whether for spark-default, hadoop core-site, etc.

I hope this helps !

edited Nov 28 '17 at 13:52

answered Nov 28 '17 at 13:01

eliasah

Unfortunately that didn't work. I think it should be a spark configuration rather than a zeppelin one? – Rami Nov 28 '17 at 13:50
Ok let me update my answer so you can set it in spark conf – eliasah Nov 28 '17 at 13:51
1

Glad it helped @Rami – eliasah Nov 28 '17 at 14:07
sure it helped! As always ;) – Rami Nov 28 '17 at 14:10
well, it is a zeppelin configuration after all, because zeppelin is the one that calls spark-submit, but if not configured otherwise, it will use the spark defaults arguments for the spark-submit command. So, rather than setting up zeppelin, this answer solves it by modifying the spark-defaults themselves. Great answer! – Radu Simionescu Jan 09 '18 at 18:01
To set the spark defaults using the console, you can use the instructions here (you'll have to scroll a little) https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html – Dominic Dao May 12 '23 at 20:07

1 Answers1