Spark standalone connection driver to worker

Question

I'm trying to host locally a spark standalone cluster. I have two heterogeneous machines connected on a LAN. Each piece of the architecture listed below is running on docker. I have the following configuration

master on machine 1 (port 7077 exposed)
worker on machine 1
driver on machine 2

I use a test application that opens a file and counts its lines. The application works when the file replicated on all workers and I use SparkContext.readText()

But when when the file is only present on worker while I'm using SparkContext.parallelize() to access it on workers, I have the following display :

INFO StandaloneSchedulerBackend: Granted executor ID app-20180116210619-0007/4 on hostPort 172.17.0.3:6598 with 4 cores, 1024.0 MB RAM
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180116210619-0007/4 is now RUNNING
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180116210619-0007/4 is now EXITED (Command exited with code 1)
INFO StandaloneSchedulerBackend: Executor app-20180116210619-0007/4 removed: Command exited with code 1
INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180116210619-0007/5 on worker-20180116205132-172.17.0.3-6598 (172.17.0.3:6598) with 4 cores```

that goes on and on again without actually computing the app.

This is working when I put the driver on the same pc as the worker. So I guess there is some kind of connection to permit between so two across the network. Are you aware of a way to do that (which ports to open, which adress to add in /etc/hosts ...)

zero323 · Accepted Answer · 2018-01-17T01:01:57.410

4

TL;DR Make sure that spark.driver.host:spark.driver.port can be accessed from each node in the cluster.

In general you have ensure that all nodes (both executors and master) can reach the driver.

In the cluster mode, where driver runs on one of the executors this is satisfied by default, as long as no ports are closed for the connections (see below).
In client mode machine, on which driver has been started, has to be accessible from the cluster. It means that spark.driver.host has to resolve to a publicly reachable address.

In both cases you have to keep in mind, that by default driver runs on a random port. It is possible to use a fixed one by setting spark.driver.port. Obviously this doesn't work that well, if you want to submit multiple applications at the same time.

Furthermore:

when when the file is only present on worker

won't work. All inputs have to be accessible from driver, as well as, from each executor node.

edited Jan 17 '18 at 01:01

answered Jan 16 '18 at 21:28

zero323

322,348
103
959
935

Thank to your answer, I think the issue is with communications between docker containers across network. – Matthias Beaupère Jan 17 '18 at 14:01
I finally started my drivers and worker dockers with `--net=host` option and it worked. This is only a temp solution as it doesn't give me network isolation. I'm now studying docker to find a proper solution. – Matthias Beaupère Jan 22 '18 at 09:31
Hi @user6910411 - for some reason when I start the sparkContext from my local machine to a remote spark cluster, the value for spark.driver.host is the local IP of my machine which is obviously not reachable from the worker node. How can this be resolved? – nEO Jan 03 '20 at 05:24
@nEO `spark.driver.host` (and optionally `spark.driver.bindAddress`). – user10938362 Jan 04 '20 at 11:58

Spark standalone connection driver to worker

1 Answers1

Linked

Related