How to update required memory for single node Apache Spark Scala Job?

Question

I have been running Spark scala Job using 32 cores in a single node Apache Spark-3.2.x cluster. The actual host machine consists of 256 GB RAM and 128 cores. How can I provide the memory required for the entire Job?

At present this Job processes 10M of records, constructs Data frames, apply joins, and run MLLib algorithm.

Run command: $/usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /usr/local/src//conf/:/usr/local/src/jars/* -Xmx96g org.apache.spark.deploy.SparkSubmit --master local[32] --conf spark.driver.memory=64g --class Main

Since we use multiple cores(32 cores) in the same machine, would it make sense to treat those cores as executor nodes and allocate memory?
How can I allocate optimal spark-driver memory?

Spark is primarily meant as a distributed computation framework. If you're only ever having a single machine, there's not so much value in using Spark. — Gaël J, Aug 23 '23 at 04:21
Thank you @GaëlJ . Since our product demands a single machine, we were leveraging multiple cores in a single machine. Looks, time to consider a cluster mode Job for Spark. Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /usr/local/src//conf/:/usr/local/src/jars/* -Xmx96g org.apache.spark.deploy.SparkSubmit --master local[32] --conf spark.driver.memory=64g . This means 96g of JVM is inclusive of the spark driver ? — user648330, Aug 23 '23 at 05:38
Spark is built for disributed computing, meaning that it does a lot of extra stuff to make that distributed computing possible. This trade-off is worth it when your workload can not run on a single machine. But using Spark on a single huge machine would result in sacrificing anywhere between 10% to 90% performance (depending on your codde) compared to a non-Spark version. More about how Spark uses memory - https://www.linkedin.com/pulse/apache-spark-memory-management-deep-dive-deepak-rajak/ — sarveshseri, Aug 23 '23 at 10:39
https://books.japila.pl/apache-spark-internals/memory/UnifiedMemoryManager/#creating-unifiedmemorymanager — sarveshseri, Aug 23 '23 at 10:46

How to update required memory for single node Apache Spark Scala Job?

0 Answers0