2

In 0.9.0 to view worker logs it was simple, they where one click away from the spark ui home page.

Now (1.0.0+) I cannot find them. Furthermore the Spark UI stops working when my job crashes! This is annoying, what is the point of a debugging tool that only works when your application does not need debugging. According to http://apache-spark-user-list.1001560.n3.nabble.com/Viewing-web-UI-after-fact-td12023.html I need to find out what my master-url is, but I don't how to, spark doesn't spit out this information at startup, all it says is:

... -Dspark.master=\"yarn-client\" ...

and obviously http://yarn-client:8080 doesn't work. Some sites talk about how now in YARN finding logs has been super obfuscated - rather than just being on the UI, you have to login to the boxes to find them. Surely this is a massive regression and there has to be a simpler way??

How am I supposed to find out what the master URL is? How can I find my worker (now called executor) logs?

samthebest
  • 30,803
  • 25
  • 102
  • 142

2 Answers2

2

Depending on your configuration of YARN NodeManager log aggregation, the spark job logs are aggregated automatically. Runtime log is usually be found in following ways:

Spark Master Log

If you're running with yarn-cluster, go to YARN Scheduler web UI. You can find the Spark Master log there. Job description page "log' button gives the content.

With yarn-client, the driver runs in your spark-submit command. Then what you see is the driver log, if log4j.properties is configured to output in stderr or stdout.

Spark Executor Log

Search for "executorHostname" in driver logs. See comments for more detail.

samthebest
  • 30,803
  • 25
  • 102
  • 142
suztomo
  • 5,114
  • 2
  • 20
  • 21
  • Please could you expand on "Search for "executorHostname" in driver logs.", suppose I find the hostnames for my executors, which I do know, how do I then view the logs??? – samthebest Dec 14 '14 at 13:42
  • Check the location : yarn.nodemanager.log-dirs: Determines where the container-logs are stored on the node when the containers are running. Default is ${yarn.log.dir}/userlogs. http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/ – suztomo Dec 14 '14 at 13:44
  • 1
    Yes, I'm aware that I can ssh into each box, find the actual files and read them. I want to know how to read the logs in a web UI, just like I could in 0.9.0. It seems like a major regression to make me ssh into boxes to find logs. – samthebest Dec 14 '14 at 14:10
  • If yarn.nodemanager.log.log-dirs is under yarn.log.dir, then you read the log via NomeManager's web UI in the same way as you read NodeManager's log. – suztomo Dec 14 '14 at 14:15
  • How do I find the "NomeManager's web UI" URL? I guess I just have to ask my DevOps team what they have configured it too right? Or is there a self service way to find out given one can ssh into the box? – samthebest Dec 14 '14 at 14:38
  • Yes > ask my DevOps team – suztomo Dec 14 '14 at 14:40
0

These answers document how to find them from command line or UI

Where are logs in Spark on YARN?

For UI, on an edge node

Look in /etc/hadoop/conf/yarn-site.xml for the yarn resource manager URI (yarn.resourcemanager.webapp.address).

Or use command line:

yarn logs -applicationId [OPTIONS]
Community
  • 1
  • 1
samthebest
  • 30,803
  • 25
  • 102
  • 142