1

I am trying to setup HDFS & Cloudera Manager via the Cloudera Manager API. However I am stuck at a specific point:

I setup all the HDFS roles, but the NameNode refuses to communicate with the data nodes. The relevant error from the DataNode log:

Initialization failed for Block pool BP-1653676587-172.168.215.10-1435054001015 (Datanode Uuid null) service to master.adastragrp.com/172.168.215.10:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(172.168.215.11, datanodeUuid=1a114e5d-2243-442f-8603-8905b988bea7, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster4;nsid=103396489;c=0)
    at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:917)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5085)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1140)
    at 

My DNS is configured via the hosts file, so I thought the following answer applies and tried the solution without success: https://stackoverflow.com/a/29598059/1319284

However, I have another small cluster with basically the same configuration as far as I can tell, which is working. DNS is configured through /etc/hosts as well, but here I set up the cluster via Cloudera Manager GUI instead of the API.

After that I finally found the configuration directory of the running NameNode process, and there I found a dfs_hosts_include file. Opening it reveals that only 127.0.0.1 is included. On the working cluster, all the nodes are included in that file. I find a similar weirdness in topology.map:

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<topology>
  <node name="master.adastragrp.com" rack="/default"/>
  <node name="127.0.0.1" rack="/default"/>
  <node name="slave.adastragrp.com" rack="/default"/>
  <node name="127.0.0.1" rack="/default"/>
</topology>

... That doesn't look right. Again, on the working cluster the IPs are as expected.

Not only do I not know what went wrong, I also do not know how to influence these files, as they are all auto-generated by Cloudera Manager. Has anyone seen this before and could provide guidance here?

Community
  • 1
  • 1
kutschkem
  • 7,826
  • 3
  • 21
  • 56

1 Answers1

0

I finally found where I had the problem. The problem was in /etc/cloudera-scm-agent/config.ini

I generated this file with a template, and ended up with

listening_ip=127.0.0.1

which the cloudera-cm-agent happily reported to the server. For more information, see the question Salt changing /etc/hosts, but still caching old one?

Community
  • 1
  • 1
kutschkem
  • 7,826
  • 3
  • 21
  • 56
  • I found that Cloudera agent configuration has nothing to do with that. The only thing that makes HDFS working is to make sure, that hostname is bound to 127.0.0.1, because all traffic originates from that address and DFS NameNode matches the origin IP-address with the DataNode hostname. – Jari Turkia Apr 10 '18 at 14:03
  • @JariTurkia I am not sure I understand your comment. The issue was that that the NameNode mapped all hosts to the loopback adapter - which is wrong because the cluster had more than one node, which were not on the same host as the NameNode. The scm-agent reported to the NameNode that it was listening on the loopback adapter, instead it should have told on what network-facing IP it was listening (which is going to be your origin IP). The origin IP is not 127.0.0.1 and cannot be 127.0.0.1 if the node is on a different host. – kutschkem Apr 10 '18 at 14:34
  • Ok, then this case is bit different than mine. I had a single-host setup and all hosts had 127.0.1.1 as IP-address. – Jari Turkia Apr 10 '18 at 16:41