2

Im doing some tests with spark in virtualbox. I have a cpu with 8 cores in the host machine. And I would like to test spark with maximum cores possible in the virtualbox environment to have the best perforamnce possible.

Im using 3 virtual boxs machines, one master machine, two slaves. I configure in the virtualbox settings the master machine with 2gb RAM and 1CPU, and each salve machine with 4GB RAM and 3CPU.

When I start the spark-shell with yarn the cluster "spark-shell --mastere yarn-client" appear with this settings:

enter image description here

But Im executing a query, and the same query with just one node without yarn takes 4min, with 3 nodes is taking 2,5min, so its not much difference.

Do you know how can I configure better this enviornment to increase perforamnce?If its possible configure spark with yarn with more cores, given that I have a cpu with 8 cores in the host machine?

I did not any spark cores configuration to have the values in the above image, the only configs I did in spark were:

(spark-env.sh)

SPARK_JAVA_OPTS=-Dspark.driver.port=53411
HADOOP_CONF_DIR=$HADOOP_HOME/conf
SPARK_MASTER_IP=master

(spark-defaults.conf)

spark.master            spark://master:7077
spark.serializer        org.apache.spark.serializer.KryoSerializer

(slaves)

slave1
slave2 

And to start spark:

spark-shell --master yarn-client
jUsr
  • 301
  • 1
  • 4
  • 9

0 Answers0