Why does my yarn application not have logs even with logging enabled?

Question

I have enabled logs in the xml file: yarn-site.xml, and I restarted yarn by doing:

sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-yarn-nodemanager restart

I ran my application, and then I see the applicationID in yarn application -list. So, I do this: yarn logs -applicationId <application ID>, and I get the following:

hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/  does not have any log files

Do I need to change some other configuration? Or am I accessing the logs the wrong way?

Thank you.

What is retention policy for your spark logs ? – FaigB Mar 09 '17 at 11:18 — FaigB, Mar 09 '17 at 11:18
I don't know.....how do I find out? – makansij Mar 10 '17 at 03:14 — makansij, Mar 10 '17 at 03:14
Is there any log files exist in yarn-log dir..? – BruceWayne Mar 23 '17 at 05:16 — BruceWayne, Mar 23 '17 at 05:16

score 15 · Accepted Answer · answered Mar 23 '17 at 20:36

15

yarn application -list

will list only the applications that are either in SUBMITTED, ACCEPTED or RUNNING state.

Log aggregation collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. Refer the description of yarn.log-aggregation-enable property here.

So, the applicationId listed by the command isn't completed yet and the logs are not yet collected. Thus the response when trying to access the logs of a running application

hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/  does not have any log files

You can try the same command yarn logs -applicationId <application ID> to view the logs once the application has completed.

To list all the FINISHED applications, use

yarn application -list -appStates FINISHED

Or to list all the applications

yarn application -list -appStates ALL

answered Mar 23 '17 at 20:36

franklinsijo

17,784
4
45
63

Can you comment on how to view the logs while the application is still in one of the pre-aggregation phases? Also what if the job has automatic retries, how can we differentiate between runs? – JMess Jun 11 '19 at 14:14
You should be able to see the running container logs in the Application Master UI. – franklinsijo Jun 11 '19 at 14:29
Thank you franklinsijo. In the case that the logs are too large for a browser to display them I am going to the node of the container and then looking in $HADOOP_HOME/logs. Also in regard to my earlier question, container ids will be different between retries – JMess Jun 11 '19 at 15:05
For each task, you should be able to fetch the logs based on their attempt ids. – franklinsijo Jun 11 '19 at 17:49

score 3 · Answer 2 · answered Mar 09 '17 at 03:22

3

Enable Log Aggregation

Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.

<property>
 <name>yarn.log-aggregation-enable</name>
 <value>true</value>
</property>

answered Mar 09 '17 at 03:22

Ani Menon

27,209
16
105
126

The above is exactly what I have in my `yarn-site.xml` file. What more can I do? – makansij Mar 12 '17 at 17:40
What more can I do? @AniMenon – makansij Mar 20 '17 at 19:17
I am not sure what else could be the problem. Just goto the YARN Resource Manager UI and check if your job is there in the list of all jobs. – Ani Menon Mar 21 '17 at 07:55
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html, please refer this link on "Configurations for NodeManager" – Kanagaraj Dhanapal Mar 23 '17 at 09:45

score 3 · Answer 3 · answered Apr 06 '17 at 12:09

3

In version 2.3.2 of hadoop and higher you can get log aggregation to occur hourly on running jobs using this configuration in yarn-site.xml:

<property>
    <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
    <value>3600</value>
</property>

See this for further details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html

answered Apr 06 '17 at 12:09

Christina Fisher

31
1

huh interesting. that's nice – makansij Apr 19 '17 at 18:34
I have this parameter configured already, but still no logs for running jobs. – Averell Jan 26 '19 at 01:12

score 1 · Answer 4 · answered Mar 23 '17 at 11:13

1

It was probably saved with another appOwner. You can try to specify the application owner in your command:

yarn logs -appOwner .. -application_id ..

answered Mar 23 '17 at 11:13

ML_TN

727
6
16

score 0 · Answer 5 · edited Feb 02 '22 at 17:36

ROOT CAUSE: When log aggregation has been enabled each users application logs will, by default, be placed in the directory hdfs:///app-logs//logs/<APPLICATION_ID>. By default only the user that submitted the job and members of the hadoop group will have access to read the log files. In the example directory listing below you can see that the permissions are 770. No access for anyone other than the owner and members of the hadoop group.

[root@mycluster ~]$ hdfs dfs -ls /app-logs

Found 3 items

drwxrwx---    - hive      hadoop    0 2017-03-10 15:33 /app-logs/hive

drwxrwx---    - user1     hadoop          0 2017-03-10 15:37 /app-logs/user1

drwxrwx---    - spark     hadoop          0 2017-03-10 15:39 /app-logs/spark

SOLUTION: The message above can be deceiving and does not necessarily indicate that log aggregation has not been enabled. To obtain yarn logs for an application the 'yarn logs' command must be executed as the user that submitted the application. In the example below the application was submitted by user1. If we execute the same command as above as the user 'user1' we should get the following output if log aggregation has been enabled.

yarn logs -applicationId application_1473860344791_0001
16/09/19 23:10:33 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:10:33 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
16/09/19 23:10:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/09/19 23:10:34 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e03_1473860344791_0001_01_000001 on mycluster.somedomain.com_45454
LogType:stderr
Log Upload Time:Wed Sep 14 09:44:15 -0400 2016
LogLength:0
Log Contents:
End of LogType:stderr

REFERENCE: The following document describes how to use log aggregation to collect logs for long-running YARN applications. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_yarn-resource-management/content/ch_log_a...

Why does my yarn application not have logs even with logging enabled?

5 Answers5

Enable Log Aggregation