Not all nodes used in Spark Standalone cluster

Question

I have made a Spark Standalone Cluster with two virtual machines.
In the 1st VM (8 cores, 64 GB Memory), I started the master manually using the command bin/spark-class org.apache.spark.deploy.master.Master.
In the 2nd VM (8 cores, 64 GB Memory), I started the slave manually using
bin/spark-class org.apache.spark.deploy.worker.Worker spark://<hostname of master>:7077.
Then in the 1st VM, I also started the slave using the above slave command. It can be seen in the below pic that both the workers & master are started & ALIVE.

But when I run my Spark applications only the worker in 2nd VM is run ( worker-20160613102937-10.0.37.150-47668 ). The worker of 1st VM ( worker-20160613103042-10.0.37.142-52601 ) doesn't run. See the below pic

Spark Standalone Cluster UI

I want both the workers should be used in my Spark applications. How can this be done?

EDIT : See this pic of Executor summary where the Executors corresponding to worker in VM 1st are failed.

When I click on any stdout or stderr, it shows the error of invalid log directory. See the below pic

Are you certain that the input data you are processing is big enough to be splitted across 2 executor? — Preeti Khurana, Jun 13 '16 at 07:08

score 0 · Accepted Answer · answered Jun 14 '16 at 09:39

The error is resolved. Spark was not able to create the log directory on the 1st VM. The user from which I was submitting the Spark job didn't have the permission to create a file on the path /usr/local/spark. Just changing the read/write permissions of the directory (chmod -R 777 /usr/local/spark) did the trick.

Not all nodes used in Spark Standalone cluster

1 Answers1