7

I can connect to the driver just fine by adding the following:

spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote \
                              -Dcom.sun.management.jmxremote.port=9178 \
                              -Dcom.sun.management.jmxremote.authenticate=false \
                              -Dcom.sun.management.jmxremote.ssl=false

But doing ...

spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote \
                                -Dcom.sun.management.jmxremote.port=9178 \
                                -Dcom.sun.management.jmxremote.authenticate=false \
                                -Dcom.sun.management.jmxremote.ssl=false

... only yield a bunch of errors on the driver ...

Container id: container_1501548048292_0024_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1

... and finally crashes the job.

There are no errors on the workers, it simply exits with:

[org.apache.spark.util.ShutdownHookManager] - Shutdown hook called

Spark v2.2.0, and the cluster is a simple 1m-2w-configuration, and my jobs run without issues without the executor parameters.

habitats
  • 2,203
  • 2
  • 23
  • 31
  • 1
    Have you checked, that the ports are free? If those executors get instantiated on the same machine, the port collisions spell trouble. – Rick Moritz Aug 01 '17 at 13:06
  • Conflicting ports on the worker seems to be the source for the crash, indeed. However, how do I control this otherwise? Setting it to `0` will give me a random one. Is it possible to pass different args to different executors? – habitats Aug 01 '17 at 14:23
  • 1
    I would recommend setting the executor memory large enough, that only one will fit on each machine. You may have to adjust your resource manager settings as well. – Rick Moritz Aug 01 '17 at 14:39

2 Answers2

3

As Rick Mortiz pointed out, the issue was conflicting ports for the executor jmx.

Setting:

-Dcom.sun.management.jmxremote.port=0

yields a random port, and removed the errors from Spark. To figure out which port it ends up using do:

netstat -alp | grep LISTEN.*<executor-pid>/java

which lists the currently open ports for that process.

habitats
  • 2,203
  • 2
  • 23
  • 31
1

Passing following configuration with spark-submit worked for me --conf "spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9178 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

Akshay thakur
  • 91
  • 1
  • 4