I am facing a very strange issue. I have installed the HDP 2.6.4 cluster using ambari on openshift containers. Everything working fine, except one thing, I have added the 8 datanodes and out of these only 5 datanodes are live at a time.
Main issue is that when I check it's namenode web ui or hdfs dfsadmin -report, then the names of live nodes keep changing after sometime. while total number of live nodes are same 5/8. I tried to check the datanodes logs are restarting ambari-agent and datanodes, no error in datanode logs.
Issue still persist. PFA the namenode webui pics with different live nodes name. In pic-1 it shows node-7 live in pics-2 it shows node-8 it live, it keep changin with other nodes name too.
I am getting below logs in namenode logs continously
2018-03-07 07:11:03,693 INFO net.NetworkTopology (NetworkTopology.java:add(427)) - Adding a new node: /default-rack/10.128.0.1:50010
2018-03-07 07:11:03,694 INFO blockmanagement.BlockReportLeaseManager (BlockReportLeaseManager.java:registerNode(205)) - Registered DN 56f106cc-cfb6-421f-b9fc-024a84a89c14 (10.128.0.1:50010).
2018-03-07 07:11:03,695 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2018-03-07 07:11:03,695 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(854)) - Adding new storage ID DS-def957ad-51ed-4d0e-90f4-61582ff01a8a for DN 10.128.0.1:50010
2018-03-07 07:11:03,898 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(954)) - BLOCK* registerDatanode: from DatanodeRegistration(10.128.0.1:50010, datanodeUuid=eac46cc3-a4e6-47e3-a15e-114a298da53e, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-08d20112-9269-47cc-a86d-4e213d221aad;nsid=935392924;c=0) storage eac46cc3-a4e6-47e3-a15e-114a298da53e
2018-03-07 07:11:03,898 INFO namenode.NameNode (DatanodeManager.java:registerDatanode(962)) - BLOCK* registerDatanode: 10.128.0.1:50010
2018-03-07 07:11:03,898 INFO net.NetworkTopology (NetworkTopology.java:remove(501)) - Removing a node: /default-rack/10.128.0.1:50010
2018-03-07 07:11:03,898 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
Any direction will be helpful.
-- Thanks