0

I have setup mutlinode hadoop with 3 datanodes and 1 namenode using virtualbox on Ubuntu. My host system serves as NameNode (also datanode) and two VMs serve as DataNodes. My systems are:

  1. 192.168.1.5: NameNode (also datanode)
  2. 192.168.1.10: DataNode2
  3. 192.168.1.11: DataNode3

I am able to SSH all systems from each system. My hadoop/etc/hadoop/slaves on all systems have entry as:

192.168.1.5
192.168.1.10
192.168.1.11

hadoop/etc/hadoop/master on all systems have entry as: 192.168.1.5

All core-site.xml, yarn-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh are same on machines except of missing entry for dfs.namenode.name.dir in hdfs-site.xml in both DataNodes. When I execute start-yarn.sh and start-dfs.sh from NameNode, all work fine and through JPS I am able to see all required services on all machines.


Jps on NameNode:
5840 NameNode
5996 DataNode
7065 Jps
6564 NodeManager
6189 SecondaryNameNode
6354 ResourceManager

Jps on DataNodes:
3070 DataNode
3213 NodeManager
3349 Jps

However when I want to check from namenode/dfshealth.html#tab-datanode and namenode:50070/dfshealth.html#tab-overview, both indicates only 2 datanodes.

tab-datanode shows NameNode and DataNode2 as active datanodes. DataNode3 is not displayed at all.

I checked all configuration files (mentioned xml, sh and slves/master) multiple times to make sure nothing is different on both datanodes.

Also etc/hosts file also contains all node's entry in all systems:

127.0.0.1       localhost
#127.0.1.1      smishra-VM2
192.168.1.11    DataNode3
192.168.1.10    DataNode2
192.168.1.5     NameNode

One thing I'll like mention is that I configured 1 VM 1st then I made clone of that. So both VMs have same configuration. So its more confusing why 1 datanode is shown but not the other one.

  • what have you tried till now? Did you checked log for any exception? Did you tried restarting cluster? – SMA Oct 12 '14 at 13:20
  • I restarted VMs as well as host machine multiple times. At one point it showed my DataNode3 as active but then DataNode2 was missing. Which log file I should look into? – Santosh Mishra Oct 12 '14 at 14:10
  • goto log directory, and do ls *data* – SMA Oct 12 '14 at 14:13
  • in hadoop-smishra-datanode-NameNode.log I found the only exceptio as :2014-10-12 17:31:08,687 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "NameNode/192.168.1.5"; destination host is: "NameNode":9000; – Santosh Mishra Oct 12 '14 at 14:58
  • In hadoop-smishra-datanode-DataNode3.log I found this exception: 2014-10-12 03:21:28,271 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode3:50010:DataXceiver error processing unknown operation src: /192.168.1.5:59441 dst: /192.168.1.11:50010 java.io.IOException: Version Mismatch (Expected: 28, Received: 18245 ) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:57) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:211) at java.lang.Thread.run(Thread.java:745) – Santosh Mishra Oct 12 '14 at 15:10
  • In yarn-smishra-nodemanager-DataNode3.log I found: 2014-10-12 03:00:38,194 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater java.net.ConnectException: Call From DataNode3/192.168.1.11 to NameNode:8031 failed on connection exception: java.net.ConnectException: Connection refused – Santosh Mishra Oct 12 '14 at 15:12
  • No Almas... I couldn't get the reason. I am able to SSH from host to vms and vice versa, so couldn't get the reason for network issues. For version mismatch issue, my both VMs are Clones of each other. So why I am getting issue with only one VM? – Santosh Mishra Oct 12 '14 at 16:35
  • I reconfirmed with Java -Version and Hadoop Version and both VMs are returning same values: java version "1.7.0_65" Hadoop 2.5.1 – Santosh Mishra Oct 12 '14 at 21:02
  • Please check for inbound security setting of cluster – Nag Oct 13 '14 at 09:40
  • Sorry but I am new to hadoop.. so can you please explain how to do that? – Santosh Mishra Oct 13 '14 at 13:33

1 Answers1

0

Take a look at http://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/

I'll bet that your problems come from the network configuration on your Virtual Box VMs. The post above has a lot of detail around how to ensure that the internal network between the VMs is set up correctly, with forward and reverse name resolution working, no duplicate MAC addresses, etc, which is critical for a Hadoop cluster to work correctly.

ioss
  • 105
  • 1
  • 10