3

In a Spark stand alone cluster, does the Master node run tasks as well? I wasn't sure if there Executors processes are spun up on the Master node and do work, alongside the Worker nodes.

Thanks!

Sajjad Hossain
  • 111
  • 1
  • 9
Ranjit Iyer
  • 857
  • 1
  • 11
  • 20

1 Answers1

3

Executors would only be started on the nodes where there is at least one worker daemon on that node, i.e, No executor would be started up in a node that do not serve as Worker.

However, Where to start Master and Workers are all based on your decision, there isn't such limitations that Master and Worker cannot co-locate on a same node.

To start a worker daemon the same machine with your master, you can either edit the conf/slaves file to add the master ip in it and use start-all.sh at start time or start a worker at any time you want on the master node, start-slave.sh and supply the Spark master URL --master spark://master-host:7077

Update (based on Daniel Darabos's suggestion) :

When referring to Application Detail UI's Executors tab, you could also find a row has <driver> for its Executor ID, the driver it denotes is the process where your job is scheduled and monitored, it's running the main program you submitted to the spark cluster, slicing your transformations and actions on RDDs into stages, scheduling the stages as TaskSets and arranging executors to run the tasks.

This <driver> will be started on the node which you call spark-submit in client mode, or on one of the worker nodes in cluster mode

yjshen
  • 6,583
  • 3
  • 31
  • 40
  • The Spark UI on the Executors tab lists one executor for the application driver as well. Could you cover it as well in your answer for completeness? Thanks! – Daniel Darabos May 11 '15 at 07:13
  • @DanielDarabos, thanks for your recommendation, it's very useful. I would edit my answer :) – yjshen May 11 '15 at 07:38
  • Thanks! Sounds roughly correct. I thought the `` executor was not managing stages and RDDs, but was more like the other executors. It's only used when `allowLocal=true` in [`sc.runJob`](https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L1451). But I've never been too sure about how this works — I could very well be wrong. – Daniel Darabos May 11 '15 at 09:24
  • @DanielDarabos, I do not agree with you on the ``'s role I mentioned above. You know, when we write an application, we would use `sparkContext`, the DAGScheduler that you actually called when `sc.runJob` is the real staff that slice job into stages, `TaskScheduler` is also there in SparkContext. – yjshen May 11 '15 at 09:44
  • I think `runJob` is running in the driver application, but not in the `` executor. It's in the same JVM, so the distinction is academic at best. Your explanation makes sense too. – Daniel Darabos May 11 '15 at 10:25
  • @DanielDarabos, I don't quite understand what you mean by ` executor` and `driver`, can you explain more? – yjshen May 11 '15 at 10:32
  • I'm not so sure myself, sorry :). – Daniel Darabos May 11 '15 at 12:23