Hope you are doing well - I'm running a Wordpress site with Woocommerce on a OpenLiteSpeed web server that under heavy PHP usage starts showing Timed-out 504 errors. I'm hosting everything in AWS and I'm struggling to identify the causes of the 504 errors and what could be improved to avoid them. Here are some details:
AWS setup:
- The web server is installed in a t3.medium instance with Ubuntu 20.04 amd64 and 50Gb of EBS storage (I/O optimization enabled). About 10Gb are being used as of now.
- Running PHP 7.4 and
- I'm using two CloudFront distributions for CDN: one to server images (in S3) and the other one to server CSS/JS files.
- I have an ELB to manage traffic to the Web server, Idle time out is set to 300 seconds.
- I have a db.t3.small RDS instance (100Gb gp2) running Mariadb 10.5.13, the database size is about 1.5gGb.
- I'm using Redis ElastiCache with three cache.t3.micro nodes.
Site stats:
- Site has ~1,000 hits per week.
- About 350 products pages and 50 pages.
- Page size ranges from 500kb to 13.5Mb.
What's the problem?
- The site is timing out and throwing 504 errors when using heavy PHP functions such as uploading products (and attaching images to them), uploading images, by flushing OLS cache multiple times (about 3-4) in an small span of time or navigating through the site opening a bunch of product pages and adding them to the cart.
- EC2 CPUUtilization shows max peaks at 99% but network bandwidth seems okay reaching max peaks at 2.0Gb and CPU credits remain steady.
- Db connections peak at 50 per minute and CPU utilization fluctuate between 20% and 30%.
- Burst credit remains steady.
- stderr.log shows a lot "Reached max children process limit: 35, extra: 0, current: 35, busy: 35, please increase LSAPI_CHILDREN.".
Screenshots (EC2 instance):
What I tried so far:
- I tried increasing max connections and children processes to 350 but the timed-out issue remains.
- I have increased php.ini memory limit to 512mb, but did not make any difference.
- Tried increasing db storage from 30Gb to 100Gb, no luck.
- Tried increasing EC2 instance storage from 30Gb to 50Gb but again no luck.
Questions/Help needed:
- Based on my setup, what metrics (and their aggregation) should I look for in order to pin point timed-out root causes? AWS has so many info that I'm confused on what could actually move the needle.
- Should I scale up my EC2 instance to allow more CPU power? 0r should I scale up my RDS instance? or none? I'm budget-contrained so this option is not really feasible.
- Is there any configuration at the web server that I could try? I could upload my conf file if that helps.
- Should I just move everything to a managed-hosting and live happily ever after?
Thanks in advance