8

Recently we've been experiencing issues with our Varnish (3x) -> Apache (3x) setup, resulting in a huge spike in SYN_SENT connections.

The spike itself is due to the amount of new traffic hitting the site (not a DDOS of any kind), and it seems like our Varnish machines are having problems forwarding traffic to the backend servers (drop on Apache traffic correlates to spikes on the varnishes), congesting the available ports pool with SYN_SENT.

Keep-alives are enabled on Apache (15s).

Which side is the fault on? The amount of traffic is significant, but by no amount should it cause such a setup (3x Varnish frontend machines, 3x backend Apache servers) to stall.

Please help.

Munin screenshot for connections through firewall is here.

Varnish ~$ netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c

      9 CLOSE_WAIT
     12 CLOSING
    718 ESTABLISHED
     39 FIN_WAIT1
   1714 FIN_WAIT2
     76 LAST_ACK
     12 LISTEN
    256 SYN_RECV
   6124 TIME_WAIT

/etc/sysctl.conf (Varnish)

net.ipv4.netfilter.ip_conntrack_max = 262144
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.ip_local_port_range = 1024 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_fin_timeout = 30

Apache netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c

     11 CLOSE_WAIT
    286 ESTABLISHED
     38 FIN_WAIT2
     14 LISTEN
   7220 TIME_WAIT

/etc/sysctl.conf (Apache)

vm.swappiness=10
net.core.wmem_max = 524288
net.core.wmem_default = 262144
net.core.rmem_default = 262144
net.core.rmem_max = 524288
net.ipv4.tcp_rmem = 4096 262144 524288
net.ipv4.tcp_wmem = 4096 262144 524288
net.ipv4.tcp_mem = 4096 262144 524288

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_keepalive_time = 30

net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.core.somaxconn = 2048


net.ipv4.conf.lo.arp_ignore=8
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2

vm.swappiness = 0

kernel.sysrq=1
kernel.panic = 30
mprill
  • 584
  • 3
  • 10
user150997
  • 81
  • 1
  • 2
  • 1
    Where's the firewall located? The only system with high `SYN_SENT` stats is the firewall; are you saying that it seems like the firewall is the bottleneck? – Shane Madden Dec 26 '12 at 22:31
  • The firewall with high SYN_SENT is located on the Varnish machines. – user150997 Dec 27 '12 at 07:38
  • more eth/conntrack stats here: http://grab.by/iA2M – user150997 Dec 27 '12 at 07:48
  • 1
    whats is your /proc/sys/net/ipv4/tcp_max_tw_buckets and tcp_max_syn_backlog set to? (mine is 180000 which is 180k time-wait's and 1024 (increase when more memory is present)). Also, why have you turned on tw_recycle? Wouldn't that cause errors? (or is that recycle?) – Grizly Feb 26 '13 at 02:21
  • 1
    You may want to consider setting net.ipv4.tcp_tw_recycle to zero - especially if load balancing. I've had issues with HAproxy at high concurrency with this enabled. Also, I would disable iptables during testing. I've seen some odd results with connection tracking when used in a load balanced environment. – jeffatrackaid Nov 14 '13 at 21:13
  • When you say you're running out of open ports, exactly what do you mean, which component is complaining and on which side (browser -> varnish, varnish -> apache) ? How is traffic directed at varnish is it NATed ? – Kjetil Joergensen Feb 26 '14 at 04:51
  • How many threads have you allowed in varnish (debian/ubuntu /etc/default/varnish)? Also, what's your open-file limit? – Kirrus Mar 14 '14 at 18:26

2 Answers2

3

Your problem is probably with the sysctl on the Apache servers.

Some assumptions: Typically Varnish is vastly faster at processing each connection than a webserver (unless perhaps your Varnish servers have much less CPU, and your Apache servers are only serving static files cached in memory.) I'm going to assume your connections process faster in Varnish than Apache.

Therefore, resources on your Apache servers may be ample, but requests will have to queue up somewhere, if only very briefly. Right now they are not queuing up in a healthy way where they eventually get processed.

It seems like your requests are getting stuck in Varnish and not making it to the Apache servers.

There is some evidence for this:

Notice in your munin graph, before the SYN_SENTs get backed up, requests in TIME_WAIT increase, then after a point, they start to pile up as SYN_SENTS. This indicates that requests are beginning to be answered more slowly, then the queue backs up and requests don't get answered at all.

This indicates to me that your Apache server isn't accepting enough connections (where they can then sit and queue up for Apache to process them.)

I see several possible limits in your config file:

When you have spike, you have approximately 30000 connections in the SYN_SENT state on your Varnish server.

However, on the Apache server your max_syn_backlog is only 16384. Your somaxconn is only 2048.

Notice also the size of your network memory buffers on the Apache servers is very low. You have adjusted them on the Varnish server to 16MB. But on the Apache server your net.ipv4.tcp_rmem is only 524KB to match your net.core.rmem_max.

I recommend raising all of these parameters on the Apache server.

You're going to need to focus more on the diagnostics on the Apache server to find out exactly what is going on, but you might not need to if you raise these values.

You should probably not adjust net.ipv4.tcp_mem. Notice the unit for this parameter is in pages, not bytes, thus copying the same value from net.ipv4.tcp_rmem or net.ipv4.tcp_wmem (both in bytes) doesn't make any sense. It is auto tuned by linux based on your amount of memory so it rarely needs adjustment. In fact this may be your problem, arbitrarily limiting the memory available for overall connection queuing.

See: http://russ.garrett.co.uk/2009/01/01/linux-kernel-tuning/

Also notice your "vm.swappiness = 0" is set twice, once as 10, and again at the bottom as 0, which is the effective value.

0

On the Varnish server, try to change these 2 parameters:

net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 1

tw_reuse will allow it to reuse the connections in TIME_WAIT.

tw_recycle could cause issues with load balancers etc.

Florin Asăvoaie
  • 7,057
  • 23
  • 35