0

Nginx monitoring script so-called ztc failing to load nginx test page (mostly under highest load to nginx about 2000 rps, which used as proxy), causing errors like "nginx is down" on zabbix, and, in a second, everything seems to be OK.

 [NginxStatus] 2015-12-16 20:24:55,289 - ERROR: failed to load test page
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/ztc/nginx/__init__.py", line 56, in _read_status
        u = urllib2.urlopen(url, None, 1)
      File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
        return _opener.open(url, data, timeout)
      File "/usr/lib64/python2.6/urllib2.py", line 391, in open
        response = self._open(req, data)
      File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
        '_open', req)
      File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
        result = func(*args)
      File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
        raise URLError(err)
    URLError: <urlopen error timed out>

As it happens only under highest load, about 2000 rps, I'm associating this to some kernel parameters, which are causing this.

Here's nginx configuration:

user nginx;
worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
  worker_connections  65536;
  use epoll;
  multi_accept on;
}
http {

  include       /etc/nginx/mime.types;
  default_type  application/octet-stream;

  server_tokens off;

  access_log    /var/log/nginx/access.log;

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;

#  keepalive_requests 120;
#  keepalive_timeout  65;


  gzip  on;
  gzip_http_version 1.0;
  gzip_comp_level 2;
  gzip_proxied any;
  gzip_vary off;
  gzip_types text/plain text/css application/x-javascript text/xml application/xml application/rss+xml application/atom+xml text/javascript application/javas$
ript application/json text/mathml;
  gzip_min_length  1000;
  gzip_disable     "MSIE [1-6]\.";


  variables_hash_max_size 1024;
  variables_hash_bucket_size 64;
  server_names_hash_bucket_size 64;
  types_hash_max_size 2048;
  types_hash_bucket_size 64;



  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

Here's sysctl.conf

net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.netfilter.nf_conntrack_max=1048576
net.nf_conntrack_max=1048576
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.core.somaxconn=15000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_max_tw_buckets=720000
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_fin_timeout=30

And netstat output:

netstat -an | grep -e :80 -e :443 |awk '/^tcp/ {A[$(NF)]++} END {for (I in A) {printf "%5d %s\n", A[I], I}}'

18525 TIME_WAIT
    1 CLOSE_WAIT
  499 FIN_WAIT1
 1544 FIN_WAIT2
33311 ESTABLISHED
  563 SYN_RECV
    7 CLOSING
  294 LAST_ACK
    3 LISTEN

What could be the root cause of this? Are netstat metrics abnormal for 2000rps? Is there a mistake in my sysctl.conf, which's leading to my problem?

d.ansimov
  • 2,131
  • 2
  • 31
  • 54

0 Answers0