0

We have a web application (LAMP stack) with traefik as a reverse proxy that is suddenly giving HTTP 502 and 504 errors on >50% of requests, both for static files and php scripts. In the traefik dashboard I can see a count of these errors, but the logs there don't reveal any information - I suspected the issue was with Apache timing out, possibly from being overloaded.

However looking at the Apache logs, I only see successfully processed requests, as if it's never even seeing the requests that are failing. We haven't seen any spikes in usage, the server CPU utilization hovers around 60% as is typical, and we ensured there is ample disk space. I'm at a loss for how to diagnose what specifically is going on and how to fix it.

The application is dockerized with traefik, apache, and mysql each running in their own containers, and runs on a digital ocean VPS for further info. The software versions are as follows:

Apache: 2.4.57, PHP: 7.2.34-39, Traefik: 1.7.33, MySQL: 14.14 Distrib 5.7.35

Any insight or suggestions would be greatly appreciated!

bkane521
  • 101
  • 2
  • Why do you need traefik while having Apache? It's not a LAMP stack if there's Traefik in between, no? – Marcel Jul 06 '23 at 15:12
  • traefik isn't strictly necessary in the current setup, I believe the thinking was preemptive preparation for load balancing that hasn't come to fruition – bkane521 Jul 06 '23 at 15:31
  • I'd remove traefik from the stack to see how the rest of the components will behave without it and if it improves the experience for now. – Marcel Jul 06 '23 at 15:39
  • Actually my mistake, traefik is handling SSL termination, there's some automation setup around that. Still not strictly necessary but it does serve a purpose. – bkane521 Jul 06 '23 at 16:15
  • My current suspicion is this is an issue with the volume of requests (~700 per second) and traefik is giving 504 when apache doesn't respond in time. It doesn't explain the 502 which I would expect if apache was returning an error, but I don't see any errors at all in the apache logs. – bkane521 Jul 06 '23 at 16:30
  • Another interesting tidbit, when the containers are restarted, the first 2000-3000 requests all succeed and right around that time the 50Xs start dumping in alongside. – bkane521 Jul 06 '23 at 16:33
  • https://support.stackpath.com/hc/en-us/articles/360001458723-Learn-and-Troubleshoot-502-and-504-Errors#:~:text=A%20502%20or%20a%20504,will%20return%20a%205xx%20error. – Rick James Jul 07 '23 at 04:20
  • Given your current suspicion around volume of requests, have you tried enabling keep-alive connection from-to Apache? I'd start by tweaking the timeout values first to see if that would improve the 504's. But next thing would be to enable keep alive everywhere for the kernel to reuse sockets. HTTP2 would also help AFAIK. – Marcel Jul 07 '23 at 07:16
  • Additional DB information request, please. OS, Version? RAM size, # cores, any SSD or NVME devices on MySQL Host server? Post TEXT data on justpaste.it and share the links. From your SSH login root, Text results of: A) SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables; B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) STATUS; not SHOW STATUS, just STATUS; G) SHOW ENGINE INNODB STATUS; for server workload tuning analysis to provide suggestions. – Wilson Hauck Jul 10 '23 at 05:18

0 Answers0