I've got keepalived set up on 2 RHEL7.8 VMs to provide HA for a shared VIP. The VIP works and properly switches to each server when expected. I experience two issues related to the MAC not getting updated.
- Most often Server2 will become Master and take over the VIP. Traffic to the VIP continues to flow to Server1 then to Server2, which actually hosts the VIP
- Less often I will notice that the previous case doesn't happen and all traffic stops at Server1. A bunch of SYN packets hits Server1, which is not the VIP host, and die there. Server2 never gets the traffic even though it hosts the VIP.
The check & notify scripts all work fine. The VIP properly transitions to each server that I expect to be the master. The problem lies with the VIP not getting an updated MAC.
I've played around with various garp_* settings, but had no luck. Here's my current config:
Server1 = 192.168.1.10 Server2 = 192.168.1.11 VIP = 192.168.1.15 Workstation = 172.16.1.10
Server1 keepalived.conf
! Configuration File for keepalived
global_defs {
vrrp_garp_master_refresh 10
vrrp_garp_master_refresh_repeat 2
vrrp_garp_lower_prio_repeat 2
vrrp_higher_prio_send_advert true
enable_script_security
script_user root
}
vrrp_script chk_nginx_service {
script "/usr/libexec/keepalived/nginx-ha-check.sh"
interval 2
weight 50
rise 2
fall 2
}
vrrp_instance VI_1 {
state MASTER
interface ens192
virtual_router_id 51
priority 101
advert_int 1
unicast_src_ip 192.168.1.10
unicast_peer {
192.168.1.11
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.15
}
track_script {
chk_nginx_service
}
notify "/usr/libexec/keepalived/nginx-ha-notify.sh"
}
Server2 keepalived.conf
! Configuration File for keepalived
global_defs {
vrrp_garp_master_refresh 10
vrrp_garp_master_refresh_repeat 2
vrrp_garp_lower_prio_repeat 2
vrrp_higher_prio_send_advert true
enable_script_security
script_user root
}
vrrp_script chk_nginx_service {
script "/usr/libexec/keepalived/nginx-ha-check.sh"
interval 2
weight 50
rise 2
fall 2
}
vrrp_instance VI_1 {
state MASTER
interface ens192
virtual_router_id 51
priority 101
advert_int 1
unicast_src_ip 192.168.1.11
unicast_peer {
192.168.1.10
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.15
}
track_script {
chk_nginx_service
}
notify "/usr/libexec/keepalived/nginx-ha-notify.sh"
}
Server1 /var/log/messages when a restart or reload of keepalived happens
Sep 14 11:33:55 server1 systemd: Reloading LVS and VRRP High Availability Monitor.
Sep 14 11:33:55 server1 systemd: Reloaded LVS and VRRP High Availability Monitor.
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering Kernel netlink reflector
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering Kernel netlink command channel
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering gratuitous ARP shared channel
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP_Script(chk_nginx_service) considered successful on reload
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Using LinkWatch kernel netlink reflector...
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]
Server2 /var/log/messages when a restart or reload of keepalived happens
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering Kernel netlink reflector
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering Kernel netlink command channel
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering gratuitous ARP shared channel
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP_Script(chk_nginx_service) considered successful on reload
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Using LinkWatch kernel netlink reflector...
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,13)]
Sep 14 09:33:49 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.15
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Here is the tcpdump of both servers during the scenario one from above where Server1 become a proxy host to Server2
tcpdump on Server1 (Was the master, but forced to backup to test)
10:28:51.359055 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [.], seq 2599:2600, ack 41616, win 1024, length 1: HTTP
10:28:51.359083 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [.], seq 894:895, ack 469, win 1022, length 1: HTTP
10:28:51.359107 IP 155.155.230.143.58541 > 192.168.1.15.80: Flags [.], seq 2599:2600, ack 41616, win 1024, length 1: HTTP
10:28:51.359117 IP 155.155.230.143.58542 > 192.168.1.15.80: Flags [.], seq 894:895, ack 469, win 1022, length 1: HTTP
10:28:51.366289 IP 192.168.1.15.80 > 192.168.1.10.58542: Flags [.], ack 895, win 31, options [nop,nop,sack 1 {894:895}], length 0
10:28:51.366297 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [.], ack 2600, win 35, options [nop,nop,sack 1 {2599:2600}], length 0
10:28:51.366309 IP 192.168.1.15.80 > 172.16.1.10.58542: Flags [.], ack 895, win 31, options [nop,nop,sack 1 {894:895}], length 0
10:28:51.366319 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [.], ack 2600, win 35, options [nop,nop,sack 1 {2599:2600}], length 0
10:28:56.295845 IP 192.168.1.15.80 > 192.168.1.10.58542: Flags [F.], seq 469, ack 895, win 31, length 0
10:28:56.295859 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [F.], seq 41616, ack 2600, win 35, length 0
10:28:56.295892 IP 192.168.1.15.80 > 172.16.1.10.58542: Flags [F.], seq 469, ack 895, win 31, length 0
10:28:56.295897 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [F.], seq 41616, ack 2600, win 35, length 0
10:28:56.299555 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [.], ack 41617, win 1024, length 0
10:28:56.299578 IP 192.168.1.10.58541 > 192.168.1.15.80: Flags [.], ack 41617, win 1024, length 0
10:28:56.299589 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [.], ack 470, win 1022, length 0
10:28:56.299614 IP 192.168.1.10.58542 > 192.168.1.15.80: Flags [.], ack 470, win 1022, length 0
10:28:56.299789 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [F.], seq 2600, ack 41617, win 1024, length 0
10:28:56.299808 IP 192.168.1.10.58541 > 192.168.1.15.80: Flags [F.], seq 2600, ack 41617, win 1024, length 0
10:28:56.300063 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [F.], seq 895, ack 470, win 1022, length 0
10:28:56.300080 IP 192.168.1.10.58542 > 192.168.1.15.80: Flags [F.], seq 895, ack 470, win 1022, length 0
10:28:56.306882 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [.], ack 2601, win 35, length 0
10:28:56.306911 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [.], ack 2601, win 35, length 0
tcpdump on Server2 (Was the backup, but now master with VIP)
12:27:50.343649 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 9655, win 1024, length 0
12:27:50.343675 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [.], seq 20659:23419, ack 409, win 32, length 2760: HTTP
12:27:50.343687 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 23419:24460, ack 409, win 32, length 1041: HTTP
12:27:50.343694 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 12379, win 1024, length 0
12:27:50.354554 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 13759, win 1024, length 0
12:27:50.354864 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 24460, win 1024, length 0
12:27:51.023348 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 409:843, ack 24460, win 1024, length 434: HTTP: GET /api/dashboards/home HTTP/1.1
12:27:51.039476 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 24460:26214, ack 843, win 33, length 1754: HTTP: HTTP/1.1 200 OK
12:27:51.050345 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 26214, win 1024, length 0
12:27:51.190099 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 843:1287, ack 26214, win 1024, length 444: HTTP: GET /api/plugins?core=0&embedded=0 HTTP/1.1
12:27:51.205890 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 26214:26448, ack 1287, win 34, length 234: HTTP: HTTP/1.1 200 OK
12:27:51.244474 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 1287:1721, ack 26448, win 1023, length 434: HTTP: GET /api/search?limit=30 HTTP/1.1