Currently I have lab network setup to mimic customer's network deployment. (According to the customer, they are using VRF over VLAN so that their customer can have overlapping network connecting to the same server host).
So abstract away all complexity, I have a setup like this:
client 1 server
+--------------------+ +--------------------------------+
| vlan 100 | | |
| | |----+-----+ |
| ip: 192.168.1.100 |------------------>| | e | vlan100 |
| on eth1 | | | t | |
| | | | h | ip: 192.168.1.200 |
+--------------------+ | e | 0.1 | |
| t |-----+ |
client 2 | h |-----+ |
+--------------------+ | 0 | e | vlan101 |
| vlan 101 | | | t | |
| | | | h | ip: 192.168.1.200 |
| ip: 192.168.1.100 |------------------>| | 0.2 | |
| on eth1 | |----+-----+ |
| | | |
+--------------------+ +--------------------------------+
Please don't ask me why so. This setup is to reproduce the same scenario our customer will have for their deployed server. The essence is multiple VLANs are used to provide multiple overlapping virtual IP networks and the server has the same IP address for all these virtual IP networks. The server can expect connection from clients from different virtual network with colliding IP addresses.
Our server works fine for accepting connections. We bind our socket to each individual VLAN interfaces. All works fine. But the problem puzzling me is that. Client 2 cannot ping server (192.168.1.200) because all the ICMP replies are sent to client 1.
I took a look at the tcpdumped packets. The server can receive the ICMP packet from client 2 without problem. But when it tries to send back the ICMP reply, it is sent to client 1. Before the server send back the ICMP reply to 192.168.1.100, it actually, first sent out a ARP request "who-has 192.168.1.100 tell 192.168.1.200" via eth0.1. So client 1 send back an ARP reply to tell the server it has IP 192.168.1.100.
So I am wondering: 1. Why the server need to send ARP request to ask who has 192.168.1.100 again? Since it should already know 192.168.1.100 belong to client 2 from the initial ARP request from client 2 asking "who has 192.168.1.200" The arp cache on server also confirms that the server knows:
192.168.1.100 ether f0:92:1c:19:a0:01 C eth0.1
192.168.1.100 ether f0:92:1c:19:d1:c1 C eth0.2
The tcpdump captured packets also confirms this. See the screenshot here:
Second, I am wondering if there is a way to say, only reply to ICMP request via the same incoming network interface? This way, then even though there are entries for 192.168.1.100 on both eth0.1 and eth0.2, the ICMP reply will be sent back to vlan 101 correctly.
I am not so much a network engineer, if I made any mistake or wrong statement please bear with me and point out my mistakes. Thanks in advance for any answers and help.
P.S. I have I actually turned arp_ignore on. These are my server settings:
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.eth0/1.arp_announce = 2
net.ipv4.conf.eth0/1.arp_ignore = 1
net.ipv4.conf.eth0/2.arp_announce = 2
net.ipv4.conf.eth0/2.arp_ignore = 1