Starting multiple workers on a master node in Standalone mode

Question

I have a machine with 80 cores. I'd like to start a Spark server in standalone mode on this machine with 8 executors, each with 10 cores. But, when I try to start my second worker on the master, I get an error.

$ ./sbin/start-master.sh
Starting org.apache.spark.deploy.master.Master, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
Starting org.apache.spark.deploy.worker.Worker, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
org.apache.spark.deploy.worker.Worker running as process 64606.  Stop it first.

In the documentation, it clearly states "you can start one or more workers and connect them to the master via: ./sbin/start-slave.sh <master-spark-URL>". So why can't I do that?

score 3 · Accepted Answer · answered Feb 07 '20 at 19:33

3

A way to get the same parallelism is to start many workers.

You can do this by adding to the ./conf/spark-env.sh file:

SPARK_WORKER_INSTANCES=8
SPARK_WORKER_CORES=10
SPARK_EXECUTOR_CORES=10

answered Feb 07 '20 at 19:33

Ben Caine

1,128
3
15
25

score 1 · Answer 2 · answered Jan 29 '20 at 19:11

1

In a single machine, it is quite complicated but you can try docker or Kubernetes. Create multiple docker containers for spark workers.

answered Jan 29 '20 at 19:11

Gaurav Bhardwaj

66
4

That's kind of what I thought. It just can't be done with a single machine. The documentation just needs to be clearer :) – Ben Caine Jan 29 '20 at 21:31

score 1 · Answer 3 · answered Aug 19 '21 at 08:18

Just specify a new identity for every new worker/master and then launch the start-worker.sh

 export SPARK_IDENT_STRING=worker2
  ./spark-node2/sbin/start-worker.sh spark://DESKTOP-HSK5ETQ.localdomain:7077

thanks to https://stackoverflow.com/a/46205968/1743724

Starting multiple workers on a master node in Standalone mode

3 Answers3