Im doing some tests with spark in virtualbox. I have a cpu with 8 cores in the host machine. And I would like to test spark with maximum cores possible in the virtualbox environment to have the best perforamnce possible.
Im using 3 virtual boxs machines, one master machine, two slaves. I configure in the virtualbox settings the master machine with 2gb RAM and 1CPU, and each salve machine with 4GB RAM and 3CPU.
When I start the spark-shell with yarn the cluster "spark-shell --mastere yarn-client
" appear with this settings:
But Im executing a query, and the same query with just one node without yarn takes 4min, with 3 nodes is taking 2,5min, so its not much difference.
Do you know how can I configure better this enviornment to increase perforamnce?If its possible configure spark with yarn with more cores, given that I have a cpu with 8 cores in the host machine?
I did not any spark cores configuration to have the values in the above image, the only configs I did in spark were:
(spark-env.sh)
SPARK_JAVA_OPTS=-Dspark.driver.port=53411
HADOOP_CONF_DIR=$HADOOP_HOME/conf
SPARK_MASTER_IP=master
(spark-defaults.conf)
spark.master spark://master:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
(slaves)
slave1
slave2
And to start spark:
spark-shell --master yarn-client