High Mysql Connections

Question

Having a challenge discovering why results are as they are. Perhaps missing something obvious. Sorry this isn't very specific. But if anyone has an pointers on areas to focus on that would be very helpful. Cheers.

Load Test

Its about 5486 writes to db per minute/90 per sec. I can see the following errors in the logs as the servers become overwhelmed:

11: Resource temporarily unavailable) while connecting to upstream
Upstream timed out (110: Connection timed out) while reading response header from upstream

Issue

When running load test see the following problems:

The page which performs the update/write (the one the load test is hitting) slows and takes 10 20 secs to load.
Arbitrary 404's given by Nginx on any of the pages. Results show maybe 10-20% requests during peak period result in 404.

I think they are 2 separate issues and possibly unrelated. I can't see any flat lines in graphs, which would suggest a limit being reached.

Webservers sits around 60% CPU and remain stable. RAM looks OK.
Database servers sit around 20% CPU and remain stable. RAM looks OK.
Database connections go to 1500/2000. This looks iffy. Although it doesn't flat line, which suggests its not hitting a limit.
Network connection limits appear to be OK.
Indexed tables where possible/appropriate.

Infrastructure

AWS RDS MySQL 1 x db.m3.xlarge write operations 1 x db.m3.xlarge Replicated DB for read operations

AWS EC2 Webservers Linux, Nginx, PHP-FPM 6 x c3.2xlarge

Config

/etc/php-fpm.d/domain.com.conf

[domain.com]

user = nginx
group = nginx

;;;The address on which to accept FastCGI requests
listen = /var/run/php-fpm/domain.com.sock

;;;A value of '-1' means unlimited.  Althought this may be based on ulimit hard limit.
;;;May be worth setting as desired in case of the above.
listen.backlog = -1

;;;dynamic - the number of child processes is set dynamically based on the following      directives: pm.max_children, pm.start_servers, pm.min_spare_servers, pm.max_spare_servers.
pm = dynamic
;;;maximum number of child processes to be created when pm is set to dynamic
pm.max_children = 512
;;;The number of child processes created on startup. Used only when pm is set to dynamic.
pm.start_servers = 8
;;;The desired minimum number of idle server processes. Used only when pm is set to dynamic.
pm.min_spare_servers = 2
The desired maximum number of idle server processes. Used only when pm is set to dynamic
pm.max_spare_servers = 16
;;;The number of requests each child process should execute before respawning.
pm.max_requests = 500
;;;The URI to view the FPM status page.
pm.status_path = /status/fpm/domain.com
;;;The timeout for serving a single request. This option should be used when the    'max_execution_time' ini option does not stop script execution
request_terminate_timeout = 30

;;;Set open file descriptor rlimit. Default value: system defined value.
;;;rlimit_files

;;;rlimit_core int
;;;Set max core size rlimit. Possible Values: 'unlimited' or an integer greater or equal to       0. Default value: system defined value.

php_admin_value[post_max_size] = 8M
php_admin_value[upload_max_filesize] = 8M

php_admin_value[disable_functions] = exec,passthru,system,proc_open,popen,show_source

;;; Site specific custom flags go here

;;; End of site specific flags

slowlog = /var/log/nginx/slow-query-$pool.log

request_slowlog_timeout = 10s

chdir = /

Nginx - /etc/nginx/nginx.conf

events {
    worker_connections 19000;
# essential for linux, optmized to serve many clients with each thread
use epoll;
multi_accept on;
}
worker_rlimit_nofile    20000;

http {
    include         /etc/nginx/mime.types;
default_type    application/octet-stream;

log_format  proxy_combined  '$http_x_real_ip - $remote_user [$time_local] "$request" '
      '$status $body_bytes_sent "$http_referer" "$http_user_agent"';


access_log      /var/log/nginx/access.log   proxy_combined;

sendfile        on;

## Start: Size Limits & Buffer Overflows ##
client_body_buffer_size     1K;
client_header_buffer_size   1k;
# client_max_body_size        1k;
large_client_header_buffers 2 1k;
## END: Size Limits & Buffer Overflows ##

## Start: Caching file descriptors ##
open_file_cache             max=1000 inactive=20s;
open_file_cache_valid       30s;
open_file_cache_min_uses    2;
open_file_cache_errors      on;
## END: Caching

## Start: Timeouts ##
client_body_timeout   10;
client_header_timeout 10;
keepalive_timeout     5 5;
send_timeout          10;
## End: Timeouts ##

server_tokens       off;
tcp_nodelay         on;

gzip                on;
gzip_http_version   1.1;
gzip_vary           on;
gzip_comp_level     6;
gzip_proxied        any;
gzip_types          text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript text/x-js;
gzip_buffers        16 8k;
gzip_disable        "MSIE [1-6]\.(?!.*SV1)";

client_max_body_size 30M;

proxy_cache_path /var/cache/nginx/c2b levels=1:2 keys_zone=c2b-cache:8m max_size=100m inactive=60m;
proxy_temp_path /var/cache/tmp;
proxy_ignore_headers Set-Cookie X-Accel-Expires Expires Cache-Control;

# allow the server to close the connection after a client stops responding. Frees up socket-associated memory.
reset_timedout_connection on;

    include /etc/nginx/conf.d/*.conf;
}

NGINX Site specific - /etc/nginx/conf.d/domain.com

# pass the PHP scripts to FastCGI server listening on
        location ~ \.php$ {
        fastcgi_pass unix:/var/run/php-fpm/domain.com.sock;
        fastcgi_index  index.php;

        fastcgi_param  SCRIPT_FILENAME  /var/www/domain.com/public_html/$fastcgi_script_name;           
        include        fastcgi_params;
        fastcgi_read_timeout 30;
}

score 1 · Accepted Answer · answered Jul 19 '14 at 05:49

I got down to the crux of the issue. I changed the MySQL db tables from MyISAM to Innodb (where possible as I think it can stuff things up if you use full text search).

There is a bit about it here -

MyISAM Table lock issue

Can find loads more info with a quick Google

This has fixed it. Now seeing about 60,000 successful connections per minute.