14

I'm launching a distributed Spark application in YARN client mode, on a Cloudera cluster. After some time I see some errors on Cloudera Manager. Some executors get disconnected and this happens systematically. I would like to debug the issue but the internal exception is not reported by YARN.

Exception from container-launch with container ID: container_1417503665765_0193_01_000003 and exit code: 1
ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

How can I see the stacktrace of the exception? It seems that YARN reports only that the application exited abnormally. Is there a way to see spark executor log in YARN configuration ?

Nicola Ferraro
  • 4,051
  • 5
  • 28
  • 60

1 Answers1

4

Check NodeManager's yarn.nodemanager.log-dir property. It's the log location of when Spark executor container is running.

Note that when the application finishes NodeManager may remove the files (Log Aggregation). Check this document for detail. http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
suztomo
  • 5,114
  • 2
  • 20
  • 21
  • Thanks for the reply. This didn't let me find the full stack trace of the exception but now I know the cause of the problem (OperationNotSupportedException, only the description is present on the log you suggested). If you know a way to find the full stack trace, let me know. – Nicola Ferraro Dec 06 '14 at 22:06
  • You may want to catch the exception durng your function passed to transformes. – suztomo Dec 06 '14 at 23:08
  • I am trying to catch it and dump the trace to a local file in the /tmp folder. I was hoping there was a cleaner solution.. – Nicola Ferraro Dec 06 '14 at 23:13
  • Just print stack trace to stdout and check the file in yarn.nodemanager.log-dir, after confirming which NameNode take your executors (by checking ApplicationMaster's log). – suztomo Dec 07 '14 at 00:05
  • Finding logs used to be so easy, YARN totally breaks everything, now it's a massive faff. – samthebest Dec 12 '14 at 11:22