2

I have set up spark on a cluster of 3 nodes, one is my namenode-master (named h1) and other two are my datanode-workers (named h2 and h3). When I give the command to run a spark job on my master, it seems like the job is not getting distributed to the workers and it is just being done on the master. The command I gave to run the spark job is

bin/spark-submit --class org.dataalgorithms.chap07.spark.FindAssociationRules /home/ubuntu/project_spark/data-algorithms-1.0.0.jar ./in/xaa

The reason why I think its just running on the master is because when I go on the Spark Application GUI I just see the master h1 in the executor list. I would think I should see h2 and h3 my worker nodes too here? SparkUI

Correct me if I am wrong. I am a newbie so please excuse me for my ignorance.

learning_dev
  • 115
  • 2
  • 13
  • Please look at the UI of your master to see the list of available workers, not the driver's UI – ernest_k Apr 18 '18 at 15:37
  • you don't seem to pass in the --master and --deploy-mode switches – Michel Lemay Apr 18 '18 at 15:43
  • @ErnestKiwele This is my master's UI. I guess my driver is running on my master by default – learning_dev Apr 18 '18 at 15:46
  • Then there's something basic missing. How are you telling spark-submit to deploy it to the cluster? What resource manager are you using? – ernest_k Apr 18 '18 at 15:47
  • @MichelLemay I tried to pass --master and --deploy-mode switches but it gives me some binding error. Is it absolutely necessary to pass those if I want to run it in a cluster mode. I would assume, I should be able to run a normal command and still be able to use my worker nodes if they are set up in slaves file? – learning_dev Apr 18 '18 at 15:47
  • @ErnestKiwele The resource manager is the default one with Hadoop. Do I explicitly have to tell spark-submit to deploy it to the cluster. Doesnt it do that automatically? – learning_dev Apr 18 '18 at 15:51
  • @user7623678 Would help if you go through deployment documentation. Please take a look at: https://spark.apache.org/docs/latest/cluster-overview.html – ernest_k Apr 18 '18 at 15:51
  • @ErnestKiwele I read it a couple of times but still don't understand what I am missing – learning_dev Apr 18 '18 at 16:06
  • have you set yarn as master in your code?and you can configure the deployment also at the code level.Then try to specify the number of executors as 2 and give a suitable number of cores as parameter while submitting the job – Kiran Balakrishnan Apr 18 '18 at 17:19
  • I'm not sure what is your problem.. but here we always start with --master yarn --deploy-mode cluster with number of cores and memory for driver and executors as well as the number of executors. Be aware that default values might not fully allocate your cluster because some memory overhead in yarn. Yarn might not be able to allocate requested number of executors. – Michel Lemay Apr 18 '18 at 19:02

2 Answers2

2

You haven't specified the mode you are deploying your job. You need to specify the --deploy-mode to deploy the job to the cluster and also you need to specify the --master which can be YARN/Mesos.

Also, when you specify YARN you need to make sure the resources you are using like executor-memory, executor-cores and num-executors are managed by cluster manager i.e YARN. YARN provides you different schedulers to allocate resources. So, you need to check the type of scheduler you have configured.

Read about schedulers here

https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

spark-submit --num-executors 50 --executor-memory 4G --executor-cores 4  --master yarn --deploy-mode cluster
wandermonk
  • 6,856
  • 6
  • 43
  • 93
0

Thank you for all the help and suggestions. I tried many of them but ended up with some or the other error. What helped me is specifying the --master spark://IP:PORT with my regular command. So my new execution command looked like this

bin/spark-submit --class org.dataalgorithms.chap07.spark.FindAssociationRules --master spark://IP:PORT /home/ubuntu/project_spark/data-algorithms-1.0.0.jar ./in/xaa

This started my spark job in a truly distributed cluster mode

learning_dev
  • 115
  • 2
  • 13