0

In my cluster CDH5.4, I have a gateway node which is on the private and public network. The cluster is on a private network. I want to use sqoop to get data out of a database server that is on the public network. When I issue the command the map tasks fail

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Unknown host specified

I understand that the cluster nodes cannot access the db server on the public network.

Given that this public - private network architecture is quite common in the industry, what is correct way to enable the datanodes to access the server on public network?

Any help is very appreciated....

Gateway node

$>route -v
Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    10.248.200.0    *               255.255.255.0   U     0      0        0 bond_internal
    192.168.196.0   *               255.255.252.0   U     0      0        0 bond_external
    link-local      *               255.255.0.0     U     1007   0        0 bond_external
    link-local      *               255.255.0.0     U     1008   0        0 bond_internal
    default         192.168.196.1   0.0.0.0         UG    0      0        0 bond_external

Datanode

$>route -v
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.248.200.0    *               255.255.255.0   U     0      0        0 bond0
link-local      *               255.255.0.0     U     1008   0        0 bond0
scott
  • 235
  • 4
  • 12
  • Perhaps a proxy through the gateway node or using Sentry – OneCricketeer Aug 05 '16 at 00:03
  • Not exactly a hadoop issue, seems to be like a network routing issue. Can you add the routing table of the node on which you are running the sqoop. Use `route -v`. – ViKiG Aug 05 '16 at 09:10
  • @vkgade - thanks for your reply. I added the routing table information. – scott Aug 05 '16 at 14:03
  • What is the gateway node IP on the 10.248.200.0 network? That IP address should be put as gateway for the datanode. I mentioned how to do it in my answer. – ViKiG Aug 05 '16 at 15:18
  • 10.248.200.2 is the ip. I will try setting this ip as the gateway in the datanodes. On the gateway node net.ipv4.ip_forward = 0, I will change that to 1. – scott Aug 06 '16 at 11:36

1 Answers1

0

Things you need to check.

Whether the node on which Sqoop is running has correct gateway node. You can check that using routing tables and change it using the route command

   route add default gw IP_OF_GATEWAY_NODE NETWORK_INTERFACE_NAME_ON_THIS_NODE #the device name is basically what your ifconfig shows

Next you need to verify whether the gateway node is actually forwarding the packets. For this you need to edit a file /etc/sysctl.conf (this file lies in different path as your linux distribution changes). To permanently set the IP forwarding on, you will have to change the value of

    net.ipv4.ip_forward=1 # if it '0' set as '1' and restart the network.

For temporarily setting IP forward on, you can run (will change back to old value after reboot)

    sysctl -w net.ipv4.ip_forward=1
ViKiG
  • 764
  • 9
  • 21