0

On my Sles i have keepalived, and haproxy. In two days i had 4 disconnects from keepalived.
Keepalived v1.2.7 (11/20,2012)

In syslog only this messages. Who can help with solving the problem?

Keepalived_vrrp[28102]: VRRP_Script(chk_haproxy) timed out
Keepalived_vrrp[28102]: Process [448] didn't respond to SIGTERM
Keepalived_vrrp[28102]: Process [450] didn't respond to SIGTERM
Keepalived_vrrp[28102]: VRRP_Script(chk_haproxy) succeeded

My config looks like

vrrp_script chk_haproxy {           
        script "killall -0 haproxy"    
        interval 2                     
        weight 2                        
}
vrrp_instance VIP_1 {
        interface eth2
        state MASTER
        virtual_router_id 88
        priority 101                   
        virtual_ipaddress {
            192.168.1.95
        }
        track_script {
            chk_haproxy
        }
user3904465
  • 11
  • 1
  • 2
  • 6

1 Answers1

3

We have a similar setup, but using kamailio instead of haproxy. Anyway, we were seeing messages like that, so we change the way we were performing the checks (our checks have nothing to do with yours, we were checking that kamailio responds to OPTIONs requests).

You can try to add fall 3, which means that the check script should fail 3 times before changing state. Also, weight is useless in the vrrp_script section.

vrrp_script chk_haproxy {           
        script "killall -0 haproxy"    
        interval 2                     
        fall 3                        
}
vrrp_instance VIP_1 {
        interface eth2
        state MASTER
        virtual_router_id 88
        priority 101                   
        virtual_ipaddress {
            192.168.1.95
        }
        track_script {
            chk_haproxy
        }

Good luck!

charli
  • 246
  • 2
  • 6
  • 1
    Another thing we did was to add a timeout to the script/command. Try this: `script "timeout 1 killall -0 haproxy` – charli Oct 09 '15 at 10:04
  • How does timeout 1 help? – Dave Stibrany Feb 29 '16 at 04:26
  • It prevents the line `VRRP_Script(chk_haproxy) timed out`, because we're forcing the timeout ourselves. It's a failed check, but we conclude that it was a client issue more than a service issue, so we increased tha `fall` to 3. FYI, we were using `sipsak` binary instead of `killall`. – charli Feb 29 '16 at 09:23
  • But won't `timeout` return a non-zero code if it does in fact timeout? I thought `vrrp_script` considers non-zero codes failures, am I missing something? – Dave Stibrany Feb 29 '16 at 20:48
  • Yeah, you're right. That's why the `fall` directive is increased to 3. The thing is that, somehow, this works better because (I'm guessing here) HAProxy doesn't need to handle the timeout of the `vrrp_script`. I think when the timeout is handled by HAProxy, it mess up the following checks, leading to a disconnect. As a disclaimer, this is my hypothesis, I haven't read the HAproxy code, neither would if I can avoid it. This happens to work for us (several months without a disconnect), but maybe won't for you. – charli Mar 01 '16 at 10:01