2

Configuration/Topology:

There are 3 machines

hadoop2                        |  hadoop                   |   driver
eth0 10.10.15.3                |  eth0 10.10.15.2          |   tap0 192.168.0.199 
route default to 10.10.15.1    |  tap0 192.168.0.195       |   route 10.10.15.0/24 to 192.168.0.195
route 192.168.0.0/24 to hadoop |  route default 10.10.15.1 |  
no iptables rules              |  route 192.168.0.0 tap0   |  
                               |  no iptables rules        |
                               |  ip_forward = 1           |

Route from hadoop2:

root@hadoop2:~# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.10.15.1      0.0.0.0         UG        0 0          0 eth0
10.10.15.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
192.168.0.0     10.10.15.2      255.255.255.0   UG        0 0          0 eth0

Route from hadoop:

root@hadoop:~# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.10.15.1      0.0.0.0         UG        0 0          0 eth0
10.10.15.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
192.168.0.0     0.0.0.0         255.255.255.0   U         0 0          0 tap0

Issue

Ping from 192.168.0.199 to 10.10.15.3 is working correctly:

PING 10.10.15.3 (10.10.15.3) 56(84) bytes of data.
64 bytes from 10.10.15.3: icmp_req=1 ttl=63 time=55.9 ms
64 bytes from 10.10.15.3: icmp_req=2 ttl=63 time=55.5 ms
64 bytes from 10.10.15.3: icmp_req=3 ttl=63 time=57.8 ms

Tcpdump on router (hadoop):

root@hadoop:~# tcpdump -n icmp -i eth0
08:53:11.899079 IP 192.168.0.199 > 10.10.15.3: ICMP echo request, id 20880, seq 1, length 64
08:53:11.899789 IP 10.10.15.3 > 192.168.0.199: ICMP echo reply, id 20880, seq 1, length 64
08:53:12.900885 IP 192.168.0.199 > 10.10.15.3: ICMP echo request, id 20880, seq 2, length 64
08:53:12.901497 IP 10.10.15.3 > 192.168.0.199: ICMP echo reply, id 20880, seq 2, length 64
08:53:13.903734 IP 192.168.0.199 > 10.10.15.3: ICMP echo request, id 20880, seq 3, length 64
08:53:13.904351 IP 10.10.15.3 > 192.168.0.199: ICMP echo reply, id 20880, seq 3, length 64

But from the other side (10.10.15.3 to 192.168.0.199) or even to router address is not working, because source address has changed. Tcpdump on hadoop2:

root@hadoop2:~# tcpdump icmp -ne -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:19:48.778020 52:54:00:f2:e6:4b > 52:54:00:82:d4:ac, ethertype IPv4 (0x0800), length 98: 10.10.15.3 > 192.168.0.199: ICMP echo request, id 3409, seq 1, length 64
10:19:49.786993 52:54:00:f2:e6:4b > 52:54:00:82:d4:ac, ethertype IPv4 (0x0800), length 98: 10.10.15.3 > 192.168.0.199: ICMP echo request, id 3409, seq 2, length 64
10:19:50.794744 52:54:00:f2:e6:4b > 52:54:00:82:d4:ac, ethertype IPv4 (0x0800), length 98: 10.10.15.3 > 192.168.0.199: ICMP echo request, id 3409, seq 3, length 64

Looks fine, isn't it? But on router (hadoop):

root@hadoop:~# tcpdump -n icmp -i eth0
08:55:37.688153 IP 10.10.15.1 > 192.168.0.199: ICMP echo request, id 3382, seq 81, length 64
08:55:37.742960 IP 192.168.0.199 > 10.10.15.1: ICMP echo reply, id 3382, seq 81, length 64
08:55:38.696155 IP 10.10.15.1 > 192.168.0.199: ICMP echo request, id 3382, seq 82, length 64
08:55:38.751218 IP 192.168.0.199 > 10.10.15.1: ICMP echo reply, id 3382, seq 82, length 64

Edit Additional log to proove that packets are sent from 10.10.15.3 not 10.10.15.1:

root@hadoop:~# tcpdump -i eth0 -ne icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:55:43.912159 52:54:00:f2:e6:4b > 52:54:00:82:d4:ac, ethertype IPv4 (0x0800), length 98: 10.10.15.1 > 192.168.0.199: ICMP echo request, id 3397, seq 1, length 64
09:55:44.033807 52:54:00:82:d4:ac > 52:54:00:80:a5:aa, ethertype IPv4 (0x0800), length 98: 192.168.0.199 > 10.10.15.1: ICMP echo reply, id 3397, seq 1, length 64
09:55:44.920389 52:54:00:f2:e6:4b > 52:54:00:82:d4:ac, ethertype IPv4 (0x0800), length 98: 10.10.15.1 > 192.168.0.199: ICMP echo request, id 3397, seq 2, length 64
09:55:44.975593 52:54:00:82:d4:ac > 52:54:00:80:a5:aa, ethertype IPv4 (0x0800), length 98: 192.168.0.199 > 10.10.15.1: ICMP echo reply, id 3397, seq 2, length 64

And ifconfig:

root@hadoop2:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:f2:e6:4b  
          inet addr:10.10.15.3  Bcast:10.10.15.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fef2:e64b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:16778 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7877 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:14829038 (14.1 MiB)  TX bytes:835235 (815.6 KiB)
          Interrupt:11 Base address:0x4000 

arp -a -n:

root@hadoop:~# arp -a -n
? (10.10.15.3) at 52:54:00:f2:e6:4b [ether] on eth0
? (192.168.0.199) at 32:a6:ed:93:e6:46 [ether] on tap0
? (10.10.15.1) at 52:54:00:80:a5:aa [ether] on eth0

root@hadoop2:~# arp -a -n
? (10.10.15.2) at 52:54:00:82:d4:ac [ether] on eth0

The address has changed. ip route get 192.168.0.199:

192.168.0.199 via 10.10.15.2 dev eth0  src 10.10.15.3 
    cache 

So it has to be okay. Let's check the iptables. Maybe there is some masquerade?

On hadoop2:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

Nope, what about hadoop?

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination  

What else can cause changing address or something else somehow? Where i can have NAT?

Virtualization Host Configuration:

ifconfig:

root@s5 ~ # ifconfig -a
eth0      Link encap:Ethernet  HWaddr 6c:62:6d:a0:77:54  
          inet addr:46.4.56.15  Bcast:46.4.56.63  Mask:255.255.255.192
          inet6 addr: 2a01:4f8:140:140e::2/64 Scope:Global
          inet6 addr: fe80::6e62:6dff:fea0:7754/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14629035 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13602067 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3739676186 (3.7 GB)  TX bytes:1918243832 (1.9 GB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:385337 errors:0 dropped:0 overruns:0 frame:0
          TX packets:385337 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:40556871 (40.5 MB)  TX bytes:40556871 (40.5 MB)

tap0      Link encap:Ethernet  HWaddr b2:bd:05:99:4e:02  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

virbr1    Link encap:Ethernet  HWaddr 52:54:00:80:a5:aa  
          inet addr:10.10.15.1  Bcast:10.10.15.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8297624 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8633037 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:494260090 (494.2 MB)  TX bytes:2661285270 (2.6 GB)

virbr1-nic Link encap:Ethernet  HWaddr 52:54:00:80:a5:aa  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

vnet0     Link encap:Ethernet  HWaddr fe:54:00:82:d4:ac  
          inet6 addr: fe80::fc54:ff:fe82:d4ac/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6365724 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7812413 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:450656918 (450.6 MB)  TX bytes:1363588305 (1.3 GB)

vnet1     Link encap:Ethernet  HWaddr fe:54:00:f2:e6:4b  
          inet6 addr: fe80::fc54:ff:fef2:e64b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2007348 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3291986 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:194231862 (194.2 MB)  TX bytes:192280276 (192.2 MB)

netstat1:

    root@s5 ~ # netstat -rn
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
    0.0.0.0         46.4.56.1       0.0.0.0         UG        0 0          0 eth0
    10.10.15.0      0.0.0.0         255.255.255.0   U         0 0          0 virbr1
    46.4.56.0       46.4.56.1       255.255.255.192 UG        0 0          0 eth0
    46.4.56.0       0.0.0.0         255.255.255.192 U         0 0          0 eth0

iptables:

root@s5 ~ # iptables -L -n -v
Chain INPUT (policy ACCEPT 3050 packets, 332K bytes)
 pkts bytes target     prot opt in     out     source               destination         
1493K  159M fail2ban-ssh  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 22
    0     0 ACCEPT     udp  --  virbr1 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
    0     0 ACCEPT     tcp  --  virbr1 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53
    0     0 ACCEPT     udp  --  virbr1 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
    0     0 ACCEPT     tcp  --  virbr1 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:67

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  104  5112 ACCEPT     tcp  --  *      *       0.0.0.0/0            10.10.15.2           state NEW tcp dpt:22
  10M 2673M ACCEPT     all  --  *      virbr1  0.0.0.0/0            10.10.15.0/24        ctstate RELATED,ESTABLISHED
9909K  596M ACCEPT     all  --  virbr1 *       10.10.15.0/24        0.0.0.0/0           
  124  8000 ACCEPT     all  --  virbr1 virbr1  0.0.0.0/0            0.0.0.0/0           
    0     0 REJECT     all  --  *      virbr1  0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
    0     0 REJECT     all  --  virbr1 *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            10.10.15.2           state NEW tcp dpt:22

Chain OUTPUT (policy ACCEPT 2835 packets, 625K bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     udp  --  *      virbr1  0.0.0.0/0            0.0.0.0/0            udp dpt:68

Chain fail2ban-ssh (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   17  1680 REJECT     all  --  *      *       221.229.166.28       0.0.0.0/0            reject-with icmp-port-unreachable
   22  2280 REJECT     all  --  *      *       222.186.21.133       0.0.0.0/0            reject-with icmp-port-unreachable
   21  2164 REJECT     all  --  *      *       222.186.160.51       0.0.0.0/0            reject-with icmp-port-unreachable
   34  2040 REJECT     all  --  *      *       108.31.71.51         0.0.0.0/0            reject-with icmp-port-unreachable
1300K  143M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0        

root@s5 ~ # iptables -t nat -L -n -v
Chain PREROUTING (policy ACCEPT 2622K packets, 156M bytes)
 pkts bytes target     prot opt in     out     source               destination         
   91  4332 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2022 to:10.10.15.2:22
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2022 to:10.10.15.2:22

Chain INPUT (policy ACCEPT 766K packets, 45M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 45990 packets, 3419K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 1753K packets, 105M bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  *      *       10.10.15.0/24        224.0.0.0/24        
    0     0 RETURN     all  --  *      *       10.10.15.0/24        255.255.255.255     
 1203 72564 MASQUERADE  tcp  --  *      *       10.10.15.0/24       !10.10.15.0/24        masq ports: 1024-65535
 5606  310K MASQUERADE  udp  --  *      *       10.10.15.0/24       !10.10.15.0/24        masq ports: 1024-65535
   76  6384 MASQUERADE  all  --  *      *       10.10.15.0/24       !10.10.15.0/24       
45956 3416K MASQUERADE  all  --  *      eth0    0.0.0.0/0            0.0.0.0/0           
    0     0 MASQUERADE  all  --  *      eth0    0.0.0.0/0            0.0.0.0/0         

brctl show

root@s5 ~ # brctl show
bridge name bridge id       STP enabled interfaces
virbr1      8000.52540080a5aa   yes     virbr1-nic
                            vnet0
                            vnet1
Dawid Pura
  • 123
  • 5
  • Could you redo the tcpdumps with the `-n` flag? Since we can't see your internal namespace, hostnames are worse than useless. – MadHatter May 06 '15 at 06:50
  • Oh, i am sorry @MadHatter, will edit the post. – Dawid Pura May 06 '15 at 06:52
  • Thanks. For multi-interface boxes, it would also be helpful to know what interface you're dumping; the easiest way to do that is to show the command as well as the output. – MadHatter May 06 '15 at 06:53
  • @MadHatter, done, i used `eth0` everywhere but you are right, thanks. – Dawid Pura May 06 '15 at 07:00
  • 3
    OK, now your problem description says that *Ping from 192.168.0.195 to 10.10.15.3 is working correctly*, but your actual tcpdump output shows it from `192.168.0.199`, which your network diagram says is a different host. So: which is it? I'm sorry if I sound terse, but in sysadmin, **precision is vital**. Also, who is `10.10.15.1`? We know he's alive, because of the tcpdump output from hadoop - but he doesn't appear on your network diagram. Finally, could you update the `netstat -r` output to use `-n` as well? – MadHatter May 06 '15 at 07:22
  • `10.10.15.1` is gate but also it's a wrong address passed into packet. Even if i remove default routing to `10.10.15.1` there is a src address in icmp packet from `10.10.15.1` which can't change this packet... – Dawid Pura May 06 '15 at 07:46
  • I am very suspicious about that host. Could you redo the `tcpdump` on `hadoop2` for the *failed* ping, but adding `-e` to get MAC addresses? You will also need to let us know the MAC addresses on the 10. network; `arp -a -n` from `hadoop2` may be a simple way. I'd like to confirm that `hadoop2` is really sending those packets out directly to `hadoop`, not via `10.10.15.1`, thus giving it a chance to mess with the packet. – MadHatter May 06 '15 at 07:54
  • As i added the proof, this `10.10.15.1` is not involved into this packet exchange at all. – Dawid Pura May 06 '15 at 08:01
  • `hadoop` has `52:54:00:82:d4:ac` – Dawid Pura May 06 '15 at 08:04
  • That's not what I asked for . Please: **precision is important**. You've shown me `tcpdump` output from `hadoop`, from which I must **infer** the original destination MAC of the packets as they left `hadoop2` by looking at what `hadoop` receives. We can **know** it by looking at tcpdump from `hadoop2`, and that's what I'm interested in. Can you also assure me these boxes aren't virtualised? – MadHatter May 06 '15 at 08:05
  • @MadHatter i added the `hadoop2` `tcpdump` (by switching the old one). It is virtualized by `qemu`, can it be a problem? – Dawid Pura May 06 '15 at 08:32
  • OK, I'm all out of ideas (sorry). By all rights, that packet should go nowhere near `10.10.15.1` at any point, and if those `iptables -t nat` outputs are from `hadoop` and `hadoop2`, nothing on either of those hosts should touch the source IP address either. I can only suspect something in the virtualisation framework (which of course can do anything, and often does) - and you're not going to find that by poking around on the guest machines. – MadHatter May 06 '15 at 08:39
  • @MadHatter i was out of ideas earlier, so don't worry and thanks for help. I am not sysadmin just a regular poor developer and this problem looks for me really weird - don't know where to go. Probably its `quemu` issue, but i still have to fix it somehow. Thanks once again! – Dawid Pura May 06 '15 at 08:47
  • Do you have admin access on the qemu host? – MadHatter May 06 '15 at 08:48
  • Yes i have, but my knowledge about `quemu` is lower than my ability to doing magic. – Dawid Pura May 06 '15 at 08:58
  • No doubt, but we could at least check the networking setup. `ifconfig -a`, `netstat -rn`, `iptables -L -n -v; iptables -t nat -L -n -v`, `brctl show` would be a start! – MadHatter May 06 '15 at 09:07
  • @MadHatter, done, a lot of things here :) – Dawid Pura May 06 '15 at 09:16

1 Answers1

1

I very strongly suspect that the problem is this line in the qemu host's iptables -t nat -L -n -v:

   76  6384 MASQUERADE  all  --  *      *       10.10.15.0/24       !10.10.15.0/24       

This is causing original (ie, not return-half) traffic from hadoop2 to driver to be NATted to 10.10.15.1.

You could test this hypothesis by exempting just the traffic we're interested in from the NAT:

qemu-host# iptables -t nat -I POSTROUTING 1 -s 10.10.15.3 -d 192.168.0.199 -j ACCEPT

if hadoop then sees the packets with the correct source address, we've nailed the problem. The solution is more complex - it will depend on what else your qemu host is doing, and you will have to work with your admins to sort that out - but at least we will have explained the previously-inexplicable NAT that's currently happening.

MadHatter
  • 79,770
  • 20
  • 184
  • 232