Finally solved it. Here is how. Sorry for this veery long post, but I spent a lot of time on this and I think some people might be interested in a detailled solution.
The setup
+---------------------------------------------------------------------------------------------+
|HOST |
| |
| +-------------------------------------------------+ |
| | UBUNTU-VM | |
| | | |
| | +-------------------+ | |
| | |UBUNTU-LXC | | +------------------+ |
| | | 10.0.0.3/24 | 10.0.0.1/24 | |OTHER VM | |
| | | eth0-----lxcbr0----------eth0-----------br0----------eth0 | |
| | | | 192.168.100.2/24| 192.168.100.1/24 |192.168.100.3/24 | |
| | +-------------------+ | +------------------+ |
| +-------------------------------------------------+ |
+---------------------------------------------------------------------------------------------+
1. Removing the NAT on UBUNTU-VM
The reason why my packets are egressing UBUNTU-VM with 192.168.100.2
is because of the default iptables
rule that is created when I start my container :
root@UBUNTU-VM# iptables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 10.0.3.0/24 !10.0.3.0/24
This rule basically says "if the packet is from subnet 10.0.3.0/24
and the destination is in another subnet, change the source ip". So if I delete this rule, I should be able to ping the outside using my container IP address. Let's remove this rule :
root@UBUNTU-VM# iptables -D POSTROUTING 1 -t nat
Now, if I ping 192.168.100.1
from my LXC container (10.0.3.233
) here is what happens :
root@HOST# tcpdump -i br0 -n
12:51:56.174009 IP 10.0.3.233 > 192.168.100.1: ICMP echo request, id 498, seq 1, length 64
12:51:56.174072 ARP, Request who-has 10.0.3.233 tell 192.168.100.1, length 28
ICMP requests are coming from my LXC ip address :)
However, br0
seems to be unable to answer.
2. Adding a default route on the HOST
root@HOST# ip route add 10.0.0.0/8 via 192.168.100.2
Now the default gateway for 10.0.0.0/8
subnet is eth0
on UBUNTU-VM. Let's try a ping :
root@HOST# tcpdump -i br0 -n
14:14:33.885982 IP 10.0.3.233 > 192.168.100.1: ICMP echo request, id 660, seq 14, length 64
14:14:34.884054 ARP, Request who-has 10.0.3.233 tell 192.168.100.1, length 28
It still does not work. I have no explanation for this unfortunately. And worst, why is br0
making an ARP request for an IP that is not even in its subnet ? At least, I would expect the ICMP request to be silently ignored, but answering with an ARP request is just weird.
3. Configuring libvirt
3.1. Current config
br0
is a bridge I configured on the host manually, using netctl
. In my UBUNTU-VM template I have this :
<interface type='bridge'>
<mac address='52:54:00:cb:aa:74'/>
<source bridge='br0'/>
<model type='e1000'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</interface>
When UBUNTU-VM is created, kvm
(or libvirt
?) creates a veth pair and attach them to the bridge.
root@HOST# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.fe0000000001 no vnet1
vnet2
For some reason, this does not work (edits/comments would be appreciated)
The solution was to configure a routed network
instead of just a bridged network
.
3.2. Define a network
Create an xml template for your network :
<network>
<name>vms</name>
<uuid>f3e18be1-41fe-4f34-87b4-f279f4a02254</uuid>
<forward mode='route'/>
<bridge name='br0' stp='on' delay='0'/>
<mac address='52:54:00:86:f3:04'/>
<ip address='192.168.100.1' netmask='255.255.255.0'>
</ip>
<route address='10.0.0.0' prefix='8' gateway='192.168.100.2'/>
</network>
Note the default route stanza.
Then source it and start it
virsh # define vms.xml
virsh # net-start vms
3.3. Edit the VM
The interface should now look like this :
<interface type='network'>
<mac address='52:54:00:cb:aa:74'/>
<source network='vms'/>
<model type='e1000'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</interface>
Final test
After restarting the VM and the container, I can finally ping br0
using the LXC container ip :
root@HOST# tcpdump -i br0 -n
14:24:00.349856 IP 10.0.3.233 > 192.168.100.1: ICMP echo request, id 468, seq 16, length 64
14:24:00.349900 IP 192.168.100.1 > 10.0.3.233: ICMP echo reply, id 468, seq 16, length 64
Remaining questions
- Why this ARP requests in
2.
?
- Why does my setup does not work unless I let libvirt handle the bridge and the routing itself ? My manual config (creating the bridge with netctl, and add the default route with
ip route add
) is very similar to what libvirt does : a bridge, with two vnet interfaces attached to it, and a default route... Is libvirt doing some black magic here ?
- Will I be able to scale the number of containers with this setup (it's is my final goal).
Sources that helped