0

After days of debugging and tweaking around with settings, I'm getting exhausted & unable to find a solution. Kindly guide.

I've the following server on DigitalOcean:

64GB Memory
8 Core processor
200GB SSD drive

And I'm running a single Wordpress site on it. Site gets high traffic. (2000 to 3000 concurrent realtime users) And I'm sure due to my bad settings I'm losing traffic & unable to serve pages to users. I expect the realtime users to be 5000+ but it always stays around 2000.

I constantly get OOM errors and due to which mysql or php5-fpm gets killed and the site goes down. If I tweak php-fpm and nginx I get 502 and 503 errors. Or I get upstream timed out (110: Connection timed out)' or FastCGI sent in stderr: PHP message: PHP Fatal error: Maximum execution time of 30 seconds exceeded error.

Now, I've tweaked the settings so that I don't get any error but the traffic has dropped to around 1500 concurrent users and it refuses to go up. So I'm sure there's something wrong in my settings.

/etc/php5/fpm/pool.d/www.conf settings:

pm = dynamic
pm.max_children = 500
pm.start_servers = 150
pm.min_spare_servers = 100
pm.max_spare_servers = 200
pm.max_requests = 5000

FastCGI settings: /etc/nginx/conf.d/default.conf

location ~ \.php$ {
             try_files $uri =404;
             # proxy buffers - no 502 errors!
             proxy_buffer_size               128k;
             proxy_buffers                   4 256k;
             proxy_busy_buffers_size         256k;

            fastcgi_buffers 256 16k;
            fastcgi_buffer_size 128k;
            fastcgi_max_temp_file_size 0;
            fastcgi_intercept_errors on;
            fastcgi_keep_conn off;
            include fastcgi_params;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            fastcgi_pass unix:/dev/shm/php-fpm-www.sock;


        }

APC setting: /etc/php5/fpm/php.ini

[apc]
apc.write_lock = 1
apc.slam_defense = 0
apc.shm_size = "1024M"

I've noticed that php5-fpm processes take a lot of memory. E.g. when I calculate the average memory per process I get: ps --no-headers -o "rss,cmd" -C php5-fpm | awk '{ sum+=$1 } END { printf ("%d%s\n", sum/NR/1024,"M") }' gives me 238M for a concurrent traffic of 1100.

Please guide me where my config is incorrect. Because I'm 100% sure my traffic is choking.


Additional info

Nginx config: /etc/nginx/nginx.conf

worker_processes  12;
worker_rlimit_nofile 20000;


events {
    worker_connections  3000;
    use epoll;
    multi_accept on;
}

But I've noticed that ulimit on the server is:

ulimit -n shows 1024 only. Is this related to my issue?

  • 1
    Is your Wordpress up-to-date? Have you checked all your plugins and their memory usage? – Tero Kilkanen Sep 13 '16 at 19:01
  • What fraction of the traffic is being served directly from the Varnish cache? What processes are using CPU? Unless your users are all logged you should be able to get 95+% cache hits, massively reducing resources required. I do this with the Nginx page cache. Google "Nginx microcaching" if you're worried about pages that change frequently. You should also be using a CDN (eg CloudFlare) for a website this busy. Check out this tutorial, parts 4 and 6. https://www.photographerstechsupport.com/tutorials/hosting-wordpress-on-aws-tutorial-part-4-wordpress-website-optimization/ – Tim Sep 13 '16 at 19:08
  • @TeroKilkanen Yes, WP is latest. I use just the bare minimum plugins, no shady plugins. All of them are from top developers. – LittleLebowski Sep 14 '16 at 03:38
  • @Tim Here are my hitrate avg: from `varnishstat` `0.9168 0.8600 0.8600` Only the authors (maximum 15 users) are logged in.Have not used nginx microcaching, will check it out. I use CloudFlare business plan and it has saved my a** more times than I can count. It's good. Are my settings correct? Do you want to have a look at the Varnish VCL file? – LittleLebowski Sep 14 '16 at 03:43
  • Main questions are what processes are using CPU (ie show us the top / htop), and why is it hitting PHP and the database so often? Look at the caching headers coming out of Wordpress, they're often wrong and will destroy cache stats, but you can rewrite them in Nginx. I have no knowledge of Varnish, I've always used Nginx page caching, but if you're serving 91% cached pages that's great. If you do caching right you could even have CloudFlare serve the pages, with a TTL of as little as 2 mins or as much as 12 hours. – Tim Sep 14 '16 at 04:18

0 Answers0