Keepalived heartbeats not being sent/received on EC2 VPC

Question

I'm trying to setup keepalived + HAProxy as a redundant load balancer on an EC2 VPC (yes, I know that ELB is an option). I believe we have things configured correctly, but killing the master server doesn't seem to failover.

Server A Config:

vrrp_script chk_haproxy {
  script "pidof haproxy"
  interval 2
}

vrrp_instance VI_1 {
  interface eth0
  state BACKUP
  priority 100
  nopreempt

  virtual_router_id 33
  unicast_src_ip 172.30.1.100
  unicast_peer {
    172.30.1.101
  }

  authentication {
    auth_type PASS
    auth_pass PASSWORD
  }

  track_script {
    chk_haproxy
  }
  notify_master /etc/keepalived/master.sh
}

Server B Config:

vrrp_script chk_haproxy {
  script "pidof haproxy"
  interval 2
}

vrrp_instance VI_1 {
  interface eth0
  state BACKUP
  priority 100
  nopreempt

  virtual_router_id 33
  unicast_src_ip 172.30.1.101
  unicast_peer {
    172.30.1.100
  }

  authentication {
    auth_type PASS
    auth_pass PASSWORD
  }

  track_script {
    chk_haproxy
  }
  notify_master /etc/keepalived/master.sh
}

I've setup the security group rules to:

HTTP              TCP           80   0.0.0.0/0
Custom ICMP Rule  Echo Reply    N/A  0.0.0.0/0
SSH               TCP           22   0.0.0.0/0
Custom Protocol   VRRP (112)    All  0.0.0.0/0
Custom ICMP Rule  Echo Request  N/A  0.0.0.0/0

However, the following command always times out from the backup (and same with reverse on master):

nc -vz 172.30.1.100 112

Also, the following command never returns anything, making me think these are still not going through for some reason:

sudo tshark -f "vrrp"

score 0 · Answer 1 · answered Jan 28 '16 at 04:10

0

Your netcat command is trying to use port 112, not protocol 112. That's why it doesn't work. Also, using netcat to test comms in this case is not the right way to go. Use either of these commands to see if your traffic is present on either instance:

tcpdump "ip proto 112" 
tshark -f "vrrp"

Your configs should define one of the servers as MASTER, the other as BACKUP. The priority should be 100 on the BACKUP, 101 on the MASTER.

Having them both set to BACKUP may be your issue.

answered Jan 28 '16 at 04:10

LinuxNinja

346
1
4

At the end of the details I pointed out that exact tshark command. I also tried the tcpdump command, and neither returned anything on either server. I tried master/backup as well with the same results, but I want the new master to switch even when the old one comes back online. It is my understanding that if the priority is the same, the higher IP will be used for priority as fallback. – James Simpson Jan 28 '16 at 04:19
In your security group, you are allowing outbound protocol 112 as well as inbound protocol 112? – LinuxNinja Jan 28 '16 at 04:29
Outbound is set to allow all traffic on all protocols and all ports. – James Simpson Jan 28 '16 at 13:39

score 0 · Answer 2 · answered Jan 28 '16 at 15:09

The issue turned out to be painfully obvious once I slept on it and took a second look (aren't they always). It was as simple as there being a typo in the unicast_src_ip. Since the IP was incorrect, no messages were going through on either server. I would have thought there'd be some error message for this, but everything started working 100% once this was fixed.

Keepalived heartbeats not being sent/received on EC2 VPC

2 Answers2