Apache Spark can not connect to master using spark-submit script on Amazon EC2

Question

First I used the spark-ec2 script to set up a Spark Cluster on EC2 with one master and one worker node.

After I connect to my EC2 master instance with ssh I want to run the spark-submit script, so that I can run my own Spark Code. I start by uploading my .jar file and then I use the script.

for this I use following command:

sudo /root/spark/bin/spark-submit --class "SimpleApp"\
--master spark://ec2-<adress>.us-west-1.compute.amazonaws.com:7077 simple-project-1.0.jar

Sadly, this will not work, as the script is not able to connect to the master (whole error message at the end):

java.io.IOException: Failed to connect to ec2-<adress>.us-west-1.compute.amazonaws.com/<private-IP>:7077

I added the inbound rule to my security group that allowed for access to port 7077 manually and still recieve the same error. Is there maybe something I have to do between the setup and start?

[ec2-user@ip-172-31-11-100 ~]$ sudo /root/spark/bin/spark-submit --class "SimpleApp" --master spark://<ec2-address>.us-west-1.compute.amazonaws.com:7077 simple-project-1.0.jar 
16/08/02 12:18:43 INFO spark.SparkContext: Running Spark version 1.6.1
16/08/02 12:18:44 WARN spark.SparkConf: 
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --num-executors to specify the number of executors
 - Or set SPARK_EXECUTOR_INSTANCES
 - spark.executor.instances to configure the number of instances in the spark config.

16/08/02 12:18:44 INFO spark.SecurityManager: Changing view acls to: root
16/08/02 12:18:44 INFO spark.SecurityManager: Changing modify acls to: root
16/08/02 12:18:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/08/02 12:18:45 INFO util.Utils: Successfully started service 'sparkDriver' on port 58516.
16/08/02 12:18:45 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/08/02 12:18:45 INFO Remoting: Starting remoting
16/08/02 12:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.31.11.100:45559]
16/08/02 12:18:46 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 45559.
16/08/02 12:18:46 INFO spark.SparkEnv: Registering MapOutputTracker
16/08/02 12:18:46 INFO spark.SparkEnv: Registering BlockManagerMaster
16/08/02 12:18:46 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/blockmgr-83f1cf8d-3783-4659-a0da-64ae7c95e850
16/08/02 12:18:46 INFO storage.DiskBlockManager: Created local directory at /mnt2/spark/blockmgr-9a22a761-a18f-45a4-9d49-dcfaf7f9e4f8
16/08/02 12:18:46 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
16/08/02 12:18:46 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/08/02 12:18:46 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/08/02 12:18:46 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/08/02 12:18:46 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/08/02 12:18:46 INFO ui.SparkUI: Started SparkUI at http://ec2-54-153-24-33.us-west-1.compute.amazonaws.com:4040
16/08/02 12:18:46 INFO spark.HttpFileServer: HTTP File server directory is /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/httpd-da6f3d59-bc33-4a06-bac9-cb0c27fd82d9
16/08/02 12:18:46 INFO spark.HttpServer: Starting HTTP Server
16/08/02 12:18:46 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/08/02 12:18:47 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59371
16/08/02 12:18:47 INFO util.Utils: Successfully started service 'HTTP file server' on port 59371.
16/08/02 12:18:47 INFO spark.SparkContext: Added JAR file:/home/ec2-user/simple-project-1.0.jar at http://172.31.11.100:59371/jars/simple-project-1.0.jar with timestamp 1470140327032
16/08/02 12:18:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:18:47 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:07 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:07 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:27 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:27 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:27 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:47 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
16/08/02 12:19:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:47 WARN cluster.SparkDeploySchedulerBackend: Application ID is not initialized yet.
16/08/02 12:19:47 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52691.
16/08/02 12:19:47 INFO netty.NettyBlockTransferService: Server created on 52691
16/08/02 12:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/08/02 12:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.31.11.100:52691 with 511.5 MB RAM, BlockManagerId(driver, 172.31.11.100, 52691)
16/08/02 12:19:47 INFO storage.BlockManagerMaster: Registered BlockManager
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/08/02 12:19:47 INFO ui.SparkUI: Stopped Spark web UI at http://ec2-54-153-24-33.us-west-1.compute.amazonaws.com:4040
16/08/02 12:19:47 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/08/02 12:19:47 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/08/02 12:19:47 WARN client.AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
16/08/02 12:19:47 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1038)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:107)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
    at org.apache.spark.SparkContext$$anonfun$stop$9.apply$mcV$sp(SparkContext.scala:1740)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1739)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
16/08/02 12:19:47 INFO storage.DiskBlockManager: Shutdown hook called
16/08/02 12:19:47 INFO util.ShutdownHookManager: Shutdown hook called
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/userFiles-7ddf41a5-7328-4bdd-afcd-a4610404ecac
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt2/spark/spark-5991f32e-20ef-4433-8de7-44ad57c53d97
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/httpd-da6f3d59-bc33-4a06-bac9-cb0c27fd82d9

Have you tried to submit it using --master local[x]? Once you are connected to the instance using SSH, it should work or at least point that the problem is on Spark and not in the network settings — andriosr, Aug 02 '16 at 12:54
I tried it and it works without a problem. But can I then also access all the slave nodes? Meaning when I start 10 instances, and use --master local[10], will he use all the slaves? — Daniel Töws, Aug 02 '16 at 15:22
if you are using local[] it;s not using any master/slave config, not even on a single node. local is good for debugging but does not test the cluster aspect. I find the 0.0.0.0 suspicious in your error, is this still an issue? If so, either accept the answer or perhaps write your own answer so we know how it worked out .... Having said that, have you checked the basic need for password-less ssh between the master and slave? This is true even on a single-node setup, even if you run master/driver and slave/worker on one machine you have to be able to ssh yourself with no password. — JimLohse, Oct 27 '16 at 18:28
Also you don't mention (or I missed) which flavor of Linux you are running. I have found in debian based versions you have to use the IP address in config files instead of the hostname. [Check this answer and the one linked from there](http://stackoverflow.com/a/34516475/3255525) for more info on this, especially if you are in Ubuntu. — JimLohse, Oct 27 '16 at 18:31

score -1 · Accepted Answer · answered Aug 02 '16 at 16:39

-1

If you are not using YARN or mesos as cluster managers, i.e. Standalone mode, you have to deploy your application to every cluster one by one, using spark-submit.

Deploying the application locally (local[n]) using SSH on each cluster would be fine, assuming that you have made the right configurations of master and slaves when creating the Standalone clusters mode.

Answering you second question, the local directive only gives you the option to set how many threads the application should run on each cluster, n being the number of threads. Hence, it has no relation to whether it is going to run on one or more clusters or not.

So, if you use spark-submit to deploy the application to all clusters (master and slaves) locally, through SSH, and have the right Standalone setup, your application should run on all the clusters.

answered Aug 02 '16 at 16:39

andriosr

481
4
12

Thank you for the answer, I will try this. I have some questions to your answer though: 1. What configurations of master and slaves would be the right one? I just started the spark-ec2 script and then tried the spark-submit script. 2. What do you mean by deploy it on each cluster one by one? Do you mean on each master and slave node one by one? – Daniel Töws Aug 02 '16 at 17:06
1) Follow the link in the answer for more details, Im not familiar with the spark-ec2 script, but Amazon probably grouped the instructions within a script. Would be nice for you to read the content of the link to get a better understanding of the Spark Standalone structure. 2) Yes, Spark topology is designed with a master cluster and n slaves, in Standalone mode, you need to deploy you application to each cluster one by one. – andriosr Aug 02 '16 at 20:56
Deploying each one by one is new information to me. How do I mak sure they work together on the same data? How do they communicate when one starts 5 minutes before the 100 node? – Daniel Töws Aug 04 '16 at 10:51
A stated here: http://spark.apache.org/docs/latest/spark-standalone.html#starng-a-cluster-manually there is a script to launch the slave clusters where you tell spark to which master the slave should be connected. If the EC2 script you used is doing so, your slaves are already connected and starting the application in the master will distribute the job across the slaves - however, the application should be deployed in the slaves for the master to be able to distribute the job to them. – andriosr Aug 04 '16 at 12:09
Thank you for your help! – Daniel Töws Aug 05 '16 at 10:29
Did this work? If so maybe accept the answer. But I am confused by what I am reading, you don't run spark-submit on each node in a cluster. You DO have options to get your jars to each node in the cluster. The most direct is to copy your jar to each node and set your PATH / classpath to see it. – JimLohse Oct 27 '16 at 18:29

Apache Spark can not connect to master using spark-submit script on Amazon EC2

1 Answers1