we have ambari cluster version - 2.6.1 and hadoop version 2.6.4
number of datanode are - 10
from ambari dashboard we can see the window that show the follwing:
DataNodes Live
9/10
but after few min all datanode are alive as:
DataNodes Live
10/10
and again after few min we see again
DataNodes Live
9/10
its seems that namenode has not received heartbeat msg from a datanode for more than interval, then datanode will be marked and as "dead"
we check the follwing:
- host resolution are - OK ( DNS is OK )
- IP's resolution are - ok ( DNS is OK )
- HDFS service check is passed successfully
- each datanode is up ( ps -ef | grep datanode | grep -v grep )
- netstat -anp | grep '0.0.0.0:50010' port is ok
- systemctl status firewalld.service ( firewall is down as should be )
- sestatus SELinux status ( is disable )
- MTU is configured to 9000 ( and we verify that 9000 set correctly on all component
what we can do else in order to verify why DataNode alive isn't stable ?