keepalived doing a lot of TCP DUP ACK and TCP Retransmission

Question

I'm trying to replicate the setup our hosting provider have done for our load balancer, which is using keepalived. So I have one "load balancer", running CentOS 6 and keepalived 1.2.7, with two Web servers running Ubuntu 12.04 LTS and Apache 2.2.

If I query one of the two Web servers directly, it's working fine, I get the response in a couple of miliseconds. But if I try to query the Web site by the load balancer, it takes one minute to get the response.

I fired up wireshark on the load balancer, and I see a lot of TCP DUP ACK and TCP Retransmission, from both sides (my Mac and the load balancer).

Anyone having the same problem?

Configuration:

vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 5
}

vrrp_script chk_http {
script "killall -0 apache2"
interval 2
weight 5
}

vrrp_instance VI_LOCAL {
interface eth1
state MASTER
virtual_router_id 51
priority 101
virtual_ipaddress {
    10.6.79.1
}
track_script {
    chk_haproxy
}
track_interface {
    eth0
    eth1
}
} 

vrrp_instance VI_PUB {
interface eth0
state MASTER
virtual_router_id 52
priority 101
virtual_ipaddress {
    192.168.1.129
    192.168.1.127
    192.168.1.128
}
track_script {
    chk_haproxy
    #chk_http
}
track_interface {
    eth0
    eth1
}
}

virtual_server 192.168.1.129 80 {
   delay_loop 6
   lb_algo rr
   lb_kind NAT
   protocol TCP

   real_server 10.6.79.10 80 {
           weight 1
           TCP_CHECK {
                   connect_timeout 180
           }
   }
   real_server 10.6.79.11 80 {
           weight 1
           TCP_CHECK {
                   connect_timeout 180
           }
   }
}

virtual_server 192.168.1.129 443 {
   delay_loop 6
   lb_algo rr
   lb_kind NAT
   protocol TCP

   real_server 10.6.79.10 443 {
           weight 1
           TCP_CHECK {
                   connect_timeout 180
           }
   }
   real_server 10.6.79.11 443 {
           weight 1
           TCP_CHECK {
                   connect_timeout 180
           }
   }
}

score 1 · Answer 1 · answered Sep 25 '13 at 01:07

1

it may be that your are having errors farther down the stack that are causing tcp retransmission. this could be as simple as a bad ethernet cable and/or nic. run "ifconfig" and look for errors... also look in messages.

if you eliminate L1 hardware as cause, check L2 issues, such as speed/duplex mismatch on switchports, etc...

answered Sep 25 '13 at 01:07

nandoP

2,021
14
15

Agreed. Start at layer 1 and work your way up from there. – joeqwerty Sep 25 '13 at 01:13
I did check for errors in the logs and in ifconfig, and I didn't find any errors. As for cable, the three servers are actually XenServer 6.2 VMs, so they are share the same hardware beside the virtual NIC. – Pascal Robert Sep 25 '13 at 13:44
Recreated the load balancer as a Ubuntu VM, and every thing works. Ubuntu 12.04 have keepalived 1.2.2 and CentOS 6 have 1.3.x, so maybe it's a difference in the versions, or the network stack (I did put the same settings in sysctl.conf on both VMs). – Pascal Robert Sep 25 '13 at 16:16

keepalived doing a lot of TCP DUP ACK and TCP Retransmission

1 Answers1