I have an LVS based load balancer which has been working just fine. It runs on two servers using heartbeat to provide failover.
I've added support for a second IP range to the system, but when the failover occurs, the server which takes over cannot ARP any IPs in this second range until I remove and re-add the route for that range.
Here's some more detail on what I see on the active load balancer right after failover:
# arp
foo1.example.com ether 00:20:ED:1A:0C:82 C eth0
foo2.example.com ether 00:1E:C9:B0:F6:FE C eth0
bar1.example.com (incomplete) eth0
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
2.2.2.128 * 255.255.255.192 U 0 0 0 eth0
1.1.1.0 * 255.255.255.0 U 0 0 0 eth0
default 1.1.1.1 0.0.0.0 UG 100 0 0 eth0
so I can't ARP that bar1.example.com which will be on the 2.2.2.* netblock
What I found is that removing and adding the route for the netblock fixes the issue
ip route del 2.2.2.128/26 dev eth0
ip route add 2.2.2.128/26 dev eth0
If I trigger an ARP lookup by pinging bar1.example.com, the ARP cache will now show
bar1.example.com ether 00:22:19:51:71:E4 C eth0
Does anyone know what's going on here, or know of a way I could get the heartbeat daemon to perform this route deletion and re-adding when it performs the takeover?