1

I've got keepalived set up on 2 RHEL7.8 VMs to provide HA for a shared VIP. The VIP works and properly switches to each server when expected. I experience two issues related to the MAC not getting updated.

  1. Most often Server2 will become Master and take over the VIP. Traffic to the VIP continues to flow to Server1 then to Server2, which actually hosts the VIP
  2. Less often I will notice that the previous case doesn't happen and all traffic stops at Server1. A bunch of SYN packets hits Server1, which is not the VIP host, and die there. Server2 never gets the traffic even though it hosts the VIP.

The check & notify scripts all work fine. The VIP properly transitions to each server that I expect to be the master. The problem lies with the VIP not getting an updated MAC.

I've played around with various garp_* settings, but had no luck. Here's my current config:

Server1 = 192.168.1.10 Server2 = 192.168.1.11 VIP = 192.168.1.15 Workstation = 172.16.1.10

Server1 keepalived.conf

! Configuration File for keepalived

global_defs {
    vrrp_garp_master_refresh 10
    vrrp_garp_master_refresh_repeat 2
    vrrp_garp_lower_prio_repeat 2
    vrrp_higher_prio_send_advert true
    enable_script_security
    script_user root
}

vrrp_script chk_nginx_service {
    script "/usr/libexec/keepalived/nginx-ha-check.sh"
    interval 2
    weight 50
    rise 2
    fall 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens192
    virtual_router_id 51
    priority 101
    advert_int 1

    unicast_src_ip 192.168.1.10
    unicast_peer {
        192.168.1.11
    }
    
    authentication {
        auth_type PASS
        auth_pass 1111
    }

    virtual_ipaddress {
        192.168.1.15
    }

    track_script {
        chk_nginx_service
    }

    notify "/usr/libexec/keepalived/nginx-ha-notify.sh"

}

Server2 keepalived.conf

! Configuration File for keepalived

global_defs {
    vrrp_garp_master_refresh 10
    vrrp_garp_master_refresh_repeat 2
    vrrp_garp_lower_prio_repeat 2
    vrrp_higher_prio_send_advert true
    enable_script_security
    script_user root
}

vrrp_script chk_nginx_service {
    script "/usr/libexec/keepalived/nginx-ha-check.sh"
    interval 2
    weight 50
    rise 2
    fall 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens192
    virtual_router_id 51
    priority 101
    advert_int 1

    unicast_src_ip 192.168.1.11
    unicast_peer {
        192.168.1.10
    }
    
    authentication {
        auth_type PASS
        auth_pass 1111
    }

    virtual_ipaddress {
        192.168.1.15
    }

    track_script {
        chk_nginx_service
    }

    notify "/usr/libexec/keepalived/nginx-ha-notify.sh"

}

Server1 /var/log/messages when a restart or reload of keepalived happens

Sep 14 11:33:55 server1 systemd: Reloading LVS and VRRP High Availability Monitor.
Sep 14 11:33:55 server1 systemd: Reloaded LVS and VRRP High Availability Monitor.
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering Kernel netlink reflector
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering Kernel netlink command channel
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Registering gratuitous ARP shared channel
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP_Script(chk_nginx_service) considered successful on reload
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: Using LinkWatch kernel netlink reflector...
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 14 11:33:55 server1 Keepalived_vrrp[99145]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]

Server2 /var/log/messages when a restart or reload of keepalived happens

Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering Kernel netlink reflector
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering Kernel netlink command channel
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Registering gratuitous ARP shared channel
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP_Script(chk_nginx_service) considered successful on reload
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: Using LinkWatch kernel netlink reflector...
Sep 14 09:33:48 server2 Keepalived_vrrp[21124]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,13)]
Sep 14 09:33:49 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.15
Sep 14 09:33:50 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.15
Sep 14 09:34:00 server2 Keepalived_vrrp[21124]: Sending gratuitous ARP on ens192 for 192.168.1.15

Here is the tcpdump of both servers during the scenario one from above where Server1 become a proxy host to Server2

tcpdump on Server1 (Was the master, but forced to backup to test)

10:28:51.359055 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [.], seq 2599:2600, ack 41616, win 1024, length 1: HTTP
10:28:51.359083 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [.], seq 894:895, ack 469, win 1022, length 1: HTTP
10:28:51.359107 IP 155.155.230.143.58541 > 192.168.1.15.80: Flags [.], seq 2599:2600, ack 41616, win 1024, length 1: HTTP
10:28:51.359117 IP 155.155.230.143.58542 > 192.168.1.15.80: Flags [.], seq 894:895, ack 469, win 1022, length 1: HTTP
10:28:51.366289 IP 192.168.1.15.80 > 192.168.1.10.58542: Flags [.], ack 895, win 31, options [nop,nop,sack 1 {894:895}], length 0
10:28:51.366297 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [.], ack 2600, win 35, options [nop,nop,sack 1 {2599:2600}], length 0
10:28:51.366309 IP 192.168.1.15.80 > 172.16.1.10.58542: Flags [.], ack 895, win 31, options [nop,nop,sack 1 {894:895}], length 0
10:28:51.366319 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [.], ack 2600, win 35, options [nop,nop,sack 1 {2599:2600}], length 0
10:28:56.295845 IP 192.168.1.15.80 > 192.168.1.10.58542: Flags [F.], seq 469, ack 895, win 31, length 0
10:28:56.295859 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [F.], seq 41616, ack 2600, win 35, length 0
10:28:56.295892 IP 192.168.1.15.80 > 172.16.1.10.58542: Flags [F.], seq 469, ack 895, win 31, length 0
10:28:56.295897 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [F.], seq 41616, ack 2600, win 35, length 0
10:28:56.299555 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [.], ack 41617, win 1024, length 0
10:28:56.299578 IP 192.168.1.10.58541 > 192.168.1.15.80: Flags [.], ack 41617, win 1024, length 0
10:28:56.299589 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [.], ack 470, win 1022, length 0
10:28:56.299614 IP 192.168.1.10.58542 > 192.168.1.15.80: Flags [.], ack 470, win 1022, length 0
10:28:56.299789 IP 172.16.1.10.58541 > 192.168.1.15.80: Flags [F.], seq 2600, ack 41617, win 1024, length 0
10:28:56.299808 IP 192.168.1.10.58541 > 192.168.1.15.80: Flags [F.], seq 2600, ack 41617, win 1024, length 0
10:28:56.300063 IP 172.16.1.10.58542 > 192.168.1.15.80: Flags [F.], seq 895, ack 470, win 1022, length 0
10:28:56.300080 IP 192.168.1.10.58542 > 192.168.1.15.80: Flags [F.], seq 895, ack 470, win 1022, length 0
10:28:56.306882 IP 192.168.1.15.80 > 192.168.1.10.58541: Flags [.], ack 2601, win 35, length 0
10:28:56.306911 IP 192.168.1.15.80 > 172.16.1.10.58541: Flags [.], ack 2601, win 35, length 0

tcpdump on Server2 (Was the backup, but now master with VIP)

12:27:50.343649 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 9655, win 1024, length 0
12:27:50.343675 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [.], seq 20659:23419, ack 409, win 32, length 2760: HTTP
12:27:50.343687 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 23419:24460, ack 409, win 32, length 1041: HTTP
12:27:50.343694 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 12379, win 1024, length 0
12:27:50.354554 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 13759, win 1024, length 0
12:27:50.354864 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 24460, win 1024, length 0
12:27:51.023348 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 409:843, ack 24460, win 1024, length 434: HTTP: GET /api/dashboards/home HTTP/1.1
12:27:51.039476 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 24460:26214, ack 843, win 33, length 1754: HTTP: HTTP/1.1 200 OK
12:27:51.050345 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [.], ack 26214, win 1024, length 0
12:27:51.190099 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 843:1287, ack 26214, win 1024, length 444: HTTP: GET /api/plugins?core=0&embedded=0 HTTP/1.1
12:27:51.205890 IP 192.168.1.15.80 > 192.168.1.11.58541: Flags [P.], seq 26214:26448, ack 1287, win 34, length 234: HTTP: HTTP/1.1 200 OK
12:27:51.244474 IP 192.168.1.11.58541 > 192.168.1.15.80: Flags [P.], seq 1287:1721, ack 26448, win 1023, length 434: HTTP: GET /api/search?limit=30 HTTP/1.1
el_sea
  • 11
  • 3
  • You should let tcpdump capture the MAC addresses as well as that is what is giving you issues. Are the virtual machines hosted on the same physical server? What does the switch network look like? These are important details as you are dealing with virtual MAC addresses and thus with Layer 2 technology. You need to provide relevant L2 information. – Tommiie Dec 03 '20 at 14:47
  • 1
    Sorry for the late response. The VMs are not on the same physical server/host. It does look to be a layer 2 issue in the switching environment. I'm now working with the network team to iron out the details. – el_sea Mar 10 '21 at 08:32

0 Answers0