3

We used 2 interfaces for our hadoop cluster. Private eth-1 and public. It looks like when hadoop datanode starts, it picks public IP address instead of private. When I look at hadoop-cmf-hdfs-DATANODE-hostname.log.out, it shows up

STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hostname.public.net/208.x.x.x

where instead it should say

STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hostname-eth1.private.net/192.168.x.x
user2562618
  • 327
  • 6
  • 14

2 Answers2

3

There is a setting in hdfs-site.xml, which can control the interface, that is used by the Data Node as its IP address.

dfs.datanode.dns.interface = The name of the Network Interface from which a data node should report its IP address.

This is set to "default". If you want to use eth1, then set this property in hdfs-site.xml as:

<property>
    <name>dfs.datanode.dns.interface</name>
    <value>eth1</value>
  </property>

To quote from "Hadoop The Definitive Guide" book:

There is also a setting for controlling which network interfaces the datanodes use as their IP addresses (for HTTP and RPC servers). The relevant property is dfs.datanode.dns.interface, which is set to default to use the default network interface. You can set this explicitly to report the address of a particular interface (eth0, for example).

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • Thank you but that line uses reverse dns query to get hostname binded to IP address on that eth1 interface. Unfortunately I dont have reverse dns in place. Is there somewhere that I can insert full qualified hostname-eth1 so it doesn't look for reverse dns ? – user2562618 Oct 20 '15 at 17:14
  • 1
    There is one more setting: "dfs.namenode.datanode.registration.ip-hostname-check". See the description of this setting here: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml – Manjunath Ballur Oct 20 '15 at 17:28
  • No luck. On /var/logs/hadoop-hdfs It is binding wrong hostname `org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2015-10-20 18:47:08,408 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2015-10-20 18:47:08,409 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is hostname.public.net` last line must be `org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is hostname-eth1.private.net` – user2562618 Oct 20 '15 at 17:56
  • I checked the Hadoop code. While getting the host name, it calls function DNS.getDefaultHost(), which takes 2 parameters: interface and nameserver. For setting the nameserver, you need to set one more parameter: "dfs.datanode.dns.nameserver". This can be left to "default" value also. – Manjunath Ballur Oct 20 '15 at 18:37
0

Can you try this property as quoted from apache web site? (Whether datanodes should use datanode hostnames when connecting to other datanodes for data transfer.)

<property>
  <name>dfs.datanode.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether datanodes should use datanode hostnames when
    connecting to other datanodes for data transfer.
  </description>
</property>

Check for other datanode properties like dfs.datanode.address as per hdfs properties and you can find a solution

One more thing: Check your IP/Domain name mapping in hosts file.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211