4

We currently have a few servers that all use the same LAN:

Host1: eth0 10.0.0.1/24
Host2: eth0 10.0.0.2/24
Host3: eth0 10.0.0.3/24
Gateway: 10.0.0.254 

We want to run some VMs (VirtualBox) on these servers. We can set them up to bridge onto the host's eth0, but we cannot use addresses from the 10.0.0.0/24 range since they may be allocated in future.

So we figured we'd use a different subnet:

Host1VM: eth0 192.168.0.1/24 (bridge to host eth0)
Host2VM: eth0 192.168.0.2/24 (bridge to host eth0)
Host3VM: eth0 192.168.0.3/24 (bridge to host eth0)

That's fine and all the VMs can communicate with each other since they're on the same subnet which uses the same physical interface.

The issue we face is we need to give those VMs access to the internet via the 10.0.0.254 gateway. So we figured why not pick one of the hosts and use it as a router/NAT?

Host1: eth0 10.0.0.1/24, eth0:0 192.168.0.254/24

Now we can give the VMs a gateway of 192.168.0.254. The problem we then see is that Host1 doesn't seem to NAT properly.

iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o eth0 -j SNAT --to 10.0.0.1

I thought that would work, and we do see it matching packets. If a VM pings the internet, we see the ICMP packet come IN to Host1 (since it's the router), and then the host re-sends the ICMP because it's NAT'ing, the internet host responds back to the host - but then it dies there. I expected the host to then forward the packet back to the VM, but it doesn't.

What am I missing, or is this setup simply not possible?

Edit: Just to clarify, we have no DENY rules within iptables, everything is default ACCEPT. We have also enable IP forwarding.

Update1 - iptables

Ignore the virbr0 - that's not related to VirtualBox VMs

# Completed on Fri Sep 20 16:50:45 2013
# Generated by iptables-save v1.4.12 on Fri Sep 20 16:50:45 2013
*nat
:PREROUTING ACCEPT [171383:10358740]
:INPUT ACCEPT [1923:115365]
:OUTPUT ACCEPT [192:21531]
:POSTROUTING ACCEPT [169544:10254463]
-A POSTROUTING -s 192.168.0.0/24 -o eth0 -j SNAT --to-source 10.0.0.1
COMMIT
# Completed on Fri Sep 20 16:50:45 2013
# Generated by iptables-save v1.4.12 on Fri Sep 20 16:50:45 2013
*filter
:INPUT ACCEPT [96628707:25146145432]
:FORWARD ACCEPT [195035595:22524430122]
:OUTPUT ACCEPT [44035412:304951330498]
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
COMMIT
# Completed on Fri Sep 20 16:50:45 2013
# Generated by iptables-save v1.4.12 on Fri Sep 20 16:50:45 2013
*mangle
:PREROUTING ACCEPT [291641356:47665886851]
:INPUT ACCEPT [96628707:25146145432]
:FORWARD ACCEPT [195035595:22524430122]
:OUTPUT ACCEPT [44035838:304951365412]
:POSTROUTING ACCEPT [239078922:327477732680]
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Fri Sep 20 16:50:45 2013

Update 2 - tcpdump

16:58:37.189758 IP 192.168.0.2 > 74.125.128.106: ICMP echo request, id 1, seq 2, length 40
16:58:37.189805 IP 10.0.0.1 > 74.125.128.106: ICMP echo request, id 1, seq 2, length 40
16:58:37.194607 IP 74.125.128.106 > 10.0.0.1: ICMP echo reply, id 1, seq 2, length 40
(no final reply back to the VM)
AndyC
  • 233
  • 4
  • 14
  • Well this is generally a bad idea (Two layer 3 networks on a common later 2), although what you've described should work. You've done a tcpdump on eth0 of Host1 to confirm it's sending the ICMP packet with a source address of 10.0.0.1? The gateway allows ping outbound? (ie, you can ping google.com directly from from Host1)? Share the output of `iptables-save` with us – fukawi2 Sep 20 '13 at 08:36
  • Added the `iptables-save` output. I'll also add the output from `tcpdump`. The gateway does allow ping outbound. – AndyC Sep 20 '13 at 08:59
  • @Andy: Did you apply all the rule for NAT including masquerading ? – Pratap Sep 20 '13 at 09:03
  • Is there no way to add a subinterface of 192.168.0.254 on your gateway? Seems like if possible, this would be a lot simpler than trying to use NAT inside NAT with consistent results. – David Houde Sep 20 '13 at 09:06
  • 1
    @PratapSingh - `SNAT` is used, no point using `MASQUERADE` since its static – AndyC Sep 20 '13 at 09:08
  • @DavidHoude Unfortunately our hosting provider manages that side of things and is not something they can do. We also to want to use a local host at some point as an IPSEC termination endpoint, so it would be good if we can get it working as above. – AndyC Sep 20 '13 at 09:10

4 Answers4

2

This is telling:

16:58:37.189758 IP 192.168.0.2 > 74.125.128.106: ICMP echo request, id 1, seq 2, length 40
16:58:37.189805 IP 10.0.0.1 > 74.125.128.106: ICMP echo request, id 1, seq 2, length 40
16:58:37.194607 IP 74.125.128.106 > 10.0.0.1: ICMP echo reply, id 1, seq 2, length 40

The upstream router (10.0.0.254) is sending the reply back to Host1, so your routing and SNAT is working. The problem is that Host1 is not passing that reply back to the 192.168.0.0/24 network.

Have you got the appropriate connection tracking kernel modules loaded?

Make sure the traffic is going into the connection tracking table:

grep src=192.168.0.2 /proc/1/net/nf_conntrack

I saw a similar issue with TCP packets just the other day which turned out to be due to rp_filter in the kernel. I can't see how that would be the issue here, but it's very similar. Can you post the route table (ip route show) from Host1 and check your rp_filter setting? (cat /proc/sys/net/ipv4/conf/default/rp_filter)

fukawi2
  • 5,396
  • 3
  • 32
  • 51
  • I checked conntrack earlier and saw the ICMP entry for the PING which suggests its tracking correctly, and I had already modified the /etc/sysctl.conf entries to set rp_filter to 0 "just in case" but it hadn't made any difference. I have a feeling this a bug/caveat with using aliased interfaces. – AndyC Sep 21 '13 at 04:03
2

I'm going to answer my own question as this is how I solved it. I'm pretty sure the other answers are more than accurate under normal circumstances, but I believe they didn't work due to either a bug or caveat with using interface aliases.

So thinking the interface alias was causing the issue I looked for other ways of providing a virtual interface, and it turns out there is a virtual interface type called a MAC-VLAN, which essentially attaches itself to the physical interface as a virtual interface with its own MAC address.

Using that the NAT worked flawlessly first time, so I presume it was simply due the MAC-VLAN interface appearing to the kernel as a completely separate interface, whereas interface aliases are causing some confusion somewhere.

For reference the command to create a MAC-VLAN is simply:

ip link add dev macvlan0 link eth0 type macvlan

It's a really nice feature, I can't believe I've not heard of it before.

AndyC
  • 233
  • 4
  • 14
1

I wonder why it is that you need to NAT them? Since the prefixes are different and you're routing the traffic, why not set a route via the virtual machine in the gateway for the 10.0.0.0/24 network?

Also, when you configure eth0:0, generally you configure eth0 and vice versa. Use eth0:1, or better yet, stop using ancient net-tools tools and use iproute2 instead (ip addr add 192.168.0.254/24 dev eth0, etc.). This might be the problem, as your SNAT rule is correct. If your router doesn't know it's 10.0.0.1, it won't NAT the return traffic back.

If ip addr show dev eth0 shows you both addresses you set, your NAT is configured correctly and should be working. You may have hit a bug, since the NAT is sending the packets out on the same interface with a different source address, which is an unusual scenario.

However, part of what makes this scenario messy is the existence of ICMP redirects. It's usually held that these are evil (as they allow attackers to modify your routing tables in some circumstances), and so they are almost always disabled, but linux will usually send them when forwarding traffic out the same interface it came in on. In this case it's basically a message saying "that host is on your local segment; you don't need to forward it through me". This would normally be right, except that because of the SNAT, the (previous hop) host in fact does need to do so. To disable them, use these sysctl settings:

net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.conf.eth0.accept_redirects = 0

Depending on the next hop device, you might have to restart that device, configure it in some way, or wait a little while for the redirect to go away, if that is what was causing the trouble.

Falcon Momot
  • 25,244
  • 15
  • 63
  • 92
  • Need to NAT them because the internet gateway doesn't know about the 192.168.0.0/24 subnet and we have no control over that side of things (managed by the hosting provider). This is unfortunately just how it is. **Edit** iproute2 method had same result. – AndyC Sep 20 '13 at 09:13
  • I recall having this problem a long time ago on a 2.6.20-ish Linux kernel, but I observe more reasonable behaviour now (3.10.7, net-tools from a May 2013 snapshot) where eth0:0 is actually treated as being distinct from eth0 and is tagged correctly, so this might no longer apply, but it can't hurt. – Falcon Momot Sep 20 '13 at 09:19
  • Edited to point out a feature I'd completely forgotten about. – Falcon Momot Sep 20 '13 at 09:43
  • I've modified with the redirect settings and left it an hour or so, but it still doesn't work. I may schedule a reboot if I can to eliminate any random bugs. – AndyC Sep 20 '13 at 10:48
  • You might want to. That the return packet arrives at your router does imply that the router is somehow messing up, despite that the configuration is correct. I also wonder if you've somehow disabled conntrack, though I see you have firewall rules that depend on it, so I don't think you have. – Falcon Momot Sep 20 '13 at 10:56
  • Reboot made no difference, still having the exact same issue. – AndyC Sep 20 '13 at 11:14
  • You could try using a masquerade rule, but the problem there is that you don't get to specify the source address to NAT to. – Falcon Momot Sep 20 '13 at 11:18
0

Given the topology, as I understood you, your best bet would be to give the network 192.168.0.0/24 network access to the internet on the router. You'd have to give the router an address in network 192.168.0.0/24 for this to work.

If you're not able to do that, than you have to use MASQUERADE, but restrict it to destinations not on the VM network. Something like this on every host:

iptables -A POSTROUTING -o eth0 ! -d 192.168.0.0/24 -j MASQUERADE

That way all traffic between VMs goes unchanged, all traffic form VMs to other networks appears with the hosts address on the wire.

Mathias Weidner
  • 417
  • 3
  • 10
  • There is no need to use MASQUERADE since all the hosts have static addresses. There's also no point MASQUERADE'ing or SNAT'ing the 10.0.0.0/24 hosts traffic since source address selection will select the appropriate 10. address. – fukawi2 Sep 20 '13 at 22:44