I have been running Spark scala Job using 32 cores in a single node Apache Spark-3.2.x cluster. The actual host machine consists of 256 GB RAM and 128 cores. How can I provide the memory required for the entire Job?
At present this Job processes 10M of records, constructs Data frames, apply joins, and run MLLib algorithm.
Run command: $/usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /usr/local/src//conf/:/usr/local/src/jars/* -Xmx96g org.apache.spark.deploy.SparkSubmit --master local[32] --conf spark.driver.memory=64g --class Main
Since we use multiple cores(32 cores) in the same machine, would it make sense to treat those cores as executor nodes and allocate memory?
How can I allocate optimal spark-driver memory?