1

I'm running a pyspark code stored on the master node on an AWS EMR cluster (1 master and 2 slaves each with 8GB RAM and 4 cores) with the command -

spark-submit --master yarn --deploy-mode cluster --jars /home/hadoop/mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar --driver-class-path /home/hadoop/mysql-connector-java-5.1.45/mysql-connector-java-5.1.45.jar --conf spark.executor.extraClassPath=/home/hadoop/mysql-connector-java-5.1.45/mysql-connector-java-5.1.45.jar --driver-memory 2g --executor-cores 3 --num-executors 3 --executor-memory 5g mysql_spark.py

There are 2 things that I notice:

  1. I SSH into the slave nodes and I notice that one of the slave nodes is not being used at all (used htop for this). Attaching a screenshot. This is how it looked like throughout. Is there something wrong with my spark-submit command? 2 slave nodes screenshot
  1. Before the application was submitted, 6.54GB of 8GB of master node's RAM was already in use(used htop again). There are no other applications running. Why is this happening?
ouila
  • 45
  • 1
  • 9

2 Answers2

1

First of All you have used the --deploy-mode to be cluster which means that master isn't counted and only core/task nodes' resources are considered and eligible to launch spark executor/driver.

Click here for more information about the difference between client and cluster deploy modes.

Second: check the instance type's configurations for the property yarn.scheduler.maximum-allocation-mb which is the maximum available memory can be assigned to driver/executor.

Third: Sizing For example, If core/task nodes of the type c5.xlarge yarn.scheduler.maximum-allocation-mb = 6144 Each node can launch a single executor 5.5 GB (--executor-memory = 5g + 10% memoryOverhead by default). a driver ( 2GB ) will be launched on a single node and the other node will launch a single executor.

Recommendation: Either divide 6144 / 2 so that each node can launch 2 executors and one node will launch 1 executor and driver (1 driver + 3 executors).

0

You have specified --num-executors 3 so there will be 4 executor in total (1 driver + 3)

So for each worker node you have 4 vCores and 8GB of ram.

As per your config -

  • Driver will use 1 vCore and 2GB of memory.
  • Each executor will use 3 vCores and 5GB of memory.

So looking at your config your program should use both worker node. As one node is not enough to accommodate all the resources.


I will suggest please go check on

  1. YARN UI http://<master-ip>:8088
  2. Spark history server UI http://<master-ip>:18080

See in which node those executors spinned up (each executor will be associated with node IP) and how many are they. You can check that via navigating to that specific job.
Also verify from Spark UI how much memory and cpu have been used for each executor.

Snigdhajyoti
  • 1,327
  • 10
  • 26