0

we have the follwing hadoop cluster versions , ( DATA-NODE machine are on Linux OS version - 7.2 )

ambari - 2.6.1 HDP - 2.6.4

we saw few scenarios that disks on datanode machine became full 100%

and that because the files as - stdout are huge size

for example

/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout

from df -h , we can see

df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1.8T  1.8T  0T   100% /grid/sdb

any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode,

second: since the PATH of stdout is:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?

Judy
  • 1,595
  • 6
  • 19
  • 41

1 Answers1

0

Looking at the above path looks like your application (Hadoop Job) is writing a lot of data to stdout file. This generally happens when the Job writes data to stdout using System.out.println function or similar which is not required but sometimes can be used to debug code.

Please check your application code and make sure that it does not write to stdout.

Hope this helps.

Pradeep Bhadani
  • 4,435
  • 6
  • 29
  • 48
  • lets say , we want to do some workaround that will search this files and will move all lines in the files except the last 1000 lines , can we do that ? – Judy Jan 03 '19 at 22:44
  • you can write a cron job which tail last 1000 line to another file. but what you want to achieve with last 1000 lines? – Pradeep Bhadani Jan 08 '19 at 20:21