OpenLiteSpeed server on EC2 times out on small ecommerce site

Question

Hope you are doing well - I'm running a Wordpress site with Woocommerce on a OpenLiteSpeed web server that under heavy PHP usage starts showing Timed-out 504 errors. I'm hosting everything in AWS and I'm struggling to identify the causes of the 504 errors and what could be improved to avoid them. Here are some details:

AWS setup:

The web server is installed in a t3.medium instance with Ubuntu 20.04 amd64 and 50Gb of EBS storage (I/O optimization enabled). About 10Gb are being used as of now.
Running PHP 7.4 and
I'm using two CloudFront distributions for CDN: one to server images (in S3) and the other one to server CSS/JS files.
I have an ELB to manage traffic to the Web server, Idle time out is set to 300 seconds.
I have a db.t3.small RDS instance (100Gb gp2) running Mariadb 10.5.13, the database size is about 1.5gGb.
I'm using Redis ElastiCache with three cache.t3.micro nodes.

Site stats:

Site has ~1,000 hits per week.
About 350 products pages and 50 pages.
Page size ranges from 500kb to 13.5Mb.

What's the problem?

The site is timing out and throwing 504 errors when using heavy PHP functions such as uploading products (and attaching images to them), uploading images, by flushing OLS cache multiple times (about 3-4) in an small span of time or navigating through the site opening a bunch of product pages and adding them to the cart.
EC2 CPUUtilization shows max peaks at 99% but network bandwidth seems okay reaching max peaks at 2.0Gb and CPU credits remain steady.
Db connections peak at 50 per minute and CPU utilization fluctuate between 20% and 30%.
Burst credit remains steady.
stderr.log shows a lot "Reached max children process limit: 35, extra: 0, current: 35, busy: 35, please increase LSAPI_CHILDREN.".

Screenshots (EC2 instance):

CPUUtilization%

NetIn+NetOut

CPU Credits balance

What I tried so far:

I tried increasing max connections and children processes to 350 but the timed-out issue remains.
I have increased php.ini memory limit to 512mb, but did not make any difference.
Tried increasing db storage from 30Gb to 100Gb, no luck.
Tried increasing EC2 instance storage from 30Gb to 50Gb but again no luck.

Questions/Help needed:

Based on my setup, what metrics (and their aggregation) should I look for in order to pin point timed-out root causes? AWS has so many info that I'm confused on what could actually move the needle.
Should I scale up my EC2 instance to allow more CPU power? 0r should I scale up my RDS instance? or none? I'm budget-contrained so this option is not really feasible.
Is there any configuration at the web server that I could try? I could upload my conf file if that helps.
Should I just move everything to a managed-hosting and live happily ever after?

Thanks in advance

For a 2 CPU 4 GB RAM server, purge cache should not cause the PHP timeout issue. Maybe you can submit a ticket to support@litespeedtech.com for further assistance. — Eric, Mar 30 '22 at 02:41
1000 hits per week is one request every _ten minutes_, which is idle, you have a massive amount of hardware for that tiny load. Or is your load higher? How are you using 2GB per minute, that's 86TB/month which is MASSIVE for 1000 hits per hour. There's no way your CPU should be at 100%, look at that as your primary problem - use the Linux "top" utility as a starting point. Your instance will be out of CPU credits due to the CPU being pegged at 100%, running on baseline which is 20% of a core, which could cause PHP timeouts. I think you need to review your question for accuracy. — Tim, Mar 30 '22 at 06:44
@Tim thanks for your inputs. I edited my question to clarify that I meant max peaks and not average per minute, CPU credits look unchanged. I added links to the CPUUtilization, NetIn+NetOut, and CPU credit balance graphs for the instance. — aldo_91, Mar 30 '22 at 17:12
That's better. Are you sure it's only 1000 page requests per week? That's very low for that CPU utilization. Please turn on PHP access / error logs and reproduce the problem. Edit your question to include the web server access log, PHP access / error log, and and web server error log for that single request. Ideally do that for a few different requests. I suspect the problem is in PHP which is very CPU hungry, but your CPU level is fine and CPU credits are good. — Tim, Mar 30 '22 at 17:41
Another possible diagnostic step is to stop your instance, change it to a large instance type for 15 minutes (m5.4xlarge or something), try to reproduce the problem, stop it, and change it back. Even better do this with a second instance restored from a snapshot so your site doesn't go down if you can manage it, and use a spot instance to reduce costs. — Tim, Mar 30 '22 at 17:42

OpenLiteSpeed server on EC2 times out on small ecommerce site

0 Answers0