we have the follwing hadoop cluster versions , ( DATA-NODE machine are on Linux OS version - 7.2 )
ambari - 2.6.1 HDP - 2.6.4
we saw few scenarios that disks on datanode machine became full 100%
and that because the files as - stdout are huge size
for example
/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout
from df -h , we can see
df -h /grid/sdb
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 1.8T 1.8T 0T 100% /grid/sdb
any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode,
second: since the PATH of stdout is:
/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout
is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?