0

I have a cluster of Pis that I'm using to experiment with Hadoop. masternode is set to .190, p1 to 191 ... p4 to 194. All nodes are up and running. start-dfs.sh, stop-all.sh, etc from the master successfully start and stop the datanodes. However, on start, the datanodes cannot connect back to the master node. The datanodes are trying to use "hostname/ip_address:9000" to try and reconnect.

hadoop-hduser-datanode-p1.log reports:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.1.190:9000. Already tried 8 time(s);

master-node is set to 192.168.1.190 via reserved DNS by MAC address on my router. Same goes from the other nodes.

/etc/hosts is empty on the datanodes. Setting them doesn't change the behavior.

All the .xml files (like core-site.xml) uses "hdfs://masternode:port". None of them uses "masternode/ip address:port", so I'm not sure where the IP address is coming from.

    <property>
        <name>fs.default.name</name>
        <value>hdfs://masternode:9000/</value>
    </property>

workers file is just the name of the datanode servers:

workers" 4L, 12C                                                     1,1       All

p1
p2
p3
p4

Any ideas what is appending the IP address to the hostname?

Snap E Tom
  • 101
  • 1
  • I *think* this is a red herring and the log events simply include the IP-address the hostname resolves to as part of the log message and simply show `hostname/ip-address-of-hostname` `:port` and there is no configuration error in that regard. Your problem is probably something else. – diya Oct 09 '22 at 13:06
  • You are correct, thank you. I changed the url in core-site.xml on the datanodes and indeed the lookups failed, and the IP address was no longer there. Your answer helped lead me down tracking the root cause of the real issue. – Snap E Tom Oct 09 '22 at 20:05

1 Answers1

0

After diya pointed out that the IP address was merely a diagnostic, it was clear there was some sort of connection issue between the datanodes and namenodes on 9000.

I could ssh into the master from datanodes, but nc -zv masternode 9000 confirmed that the datanodes could not connect over 9000 to the masternode. netstat -lnt on the masternode confirmed 9000 was only bound to 127.0.0.1. This led me to this answer: https://stackoverflow.com/a/64611530/213017. I checked /etc/hosts and there was indeed an entry for 127.0.0.1 masternode. Removed that and the datanodes were able to connect.

Snap E Tom
  • 101
  • 1
  • Unfortunately, I'm facing the same issue. But in my case, namenode indeed listens on all interfaces. More than that I'm able to connect from the second host to the namenode via telnet. netstat shows an established connection and if I type, "abc" it gives a response: "org.apache.hadoop.ipc.RPC$VersionMismatch*>Server IPC version 9 cannot communicate with client version 100:@Connection closed by foreign host." So no network issue here. But the datanode still can't connect. – Yuri Dolotkazin Jun 07 '23 at 02:24