1

We are using HDP cluster with 182 data node machines:

HDP version - 2.6.4 Ambari version 2.6.1

We note the following behavior on the data nodes machines (its happens on all data-node machines and on all disks).

When we perform the command as above example:

ps -eo s,user,cmd | grep ^[RD]
D hdfs     du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
D hdfs     du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
R root     ps -eo s,user,cmd

Note - each disk in the data node is 5.4 T Bytes.

We can see that HDFS is running the "du -sk" on the data node disks

We don't like this, because the meaning of that is consuming high load CPU Avrg and sometimes even bad performance.

We are understand that HDFS need to run the "du -sk" in order to verify the disks space, but on other hand its cost - high CPU load avrg and sometimes even poor performance.

Is it possible to tell HDFS in some way to disable this verification?

Dave M
  • 4,514
  • 22
  • 31
  • 30
King David
  • 549
  • 6
  • 20

0 Answers0