0

How to calculate optimal memory setting for spark-submit command ?

I am bringing 4.5 GB data in Spark from Oracle and performing some transformation like join with a Hive table and writing it back to Oracle. My question is how to come up spark-submit command with optimal memory parameters.

spark-submit --master yarn-cluster --driver-cores 2 \
--driver-memory 2G --num-executors 10 \
--executor-cores 5 --executor-memory 2G \
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

How to calculate, what should be the driver memory, how much driver/executor memory required, how many cores required etc. ?

lospejos
  • 1,976
  • 3
  • 19
  • 35
Arvind Kumar
  • 1,325
  • 1
  • 19
  • 27

1 Answers1

1

That is, in general, a complex question with no silver bullet answer. The optimal choice depends not only on your data characteristics and the type of operations but also on the system behavior (Spark optimizer etc.). Some useful tips can be found here

ShirishT
  • 232
  • 1
  • 4