YARN cluster mode reduces number of executor instances

Question

I'm provisioning a Google Cloud Dataproc cluster in the following way: gcloud dataproc clusters create spark --async --image-version 1.2 \ --master-machine-type n1-standard-1 --master-boot-disk-size 10 \ --worker-machine-type n1-highmem-8 --num-workers 4 --worker-boot-disk-size 10 \ --num-worker-local-ssds 1

Launching a Spark application in yarn-cluster mode with

spark.driver.cores=1
spark.driver.memory=1g
spark.executor.instances=4
spark.executor.cores=8
spark.executor.memory=36g

will only ever launch 3 executor instances instead of the requested 4, effectively "wasting" a full worker node which seems to be running the driver only. Also, reducing spark.executor.cores=7 to "reserve" a core on a worker node for the driver does not seem to help.

What configuration is required to be able to run the driver in yarn-cluster mode alongside executor processes, making optimal use of the available resources?

Out of curiosity, which feature of cluster mode (as opposed to the default client mode of Dataproc) are you looking to benefit from? — Angus Davis, Dec 28 '17 at 17:30
@AngusDavis At the moment it's purely a permissions thing. The application establishing the Spark connection is running on GKE which currently seems to lack some permissions required to successfully run the driver. — Martin Studer, Dec 29 '17 at 09:32

Angus Davis · Accepted Answer · 2017-12-28T17:48:26.717

An n1-highmem-8 using Dataproc 1.2 is configured to have 40960m allocatable per YARN NodeManager. Instructing spark to use 36g of heap memory per executor will also include 3.6g of memoryOverhead (0.1 * heap memory). YARN will allocate this as the full 40960m.

The driver will use 1g of heap and 384m for memoryOverhead (the minimum value). YARN will allocate this as 2g. As the driver will always launch before executors, its memory is allocated first. When an allocation request comes in for 40960 for an executor, there is no node with that much memory available and so no container is allocated on the same node as the driver.

Using spark.executor.memory=34g will allow the driver and executor to run on the same node.

YARN cluster mode reduces number of executor instances

1 Answers1