For the past few weeks I've been getting more and more reports about lag on one of my sites. I've finally been experiencing it first hand over the last week, but I haven't been able to pinpoint the problem.
The server load is never higher than about 0.5 out of 16 cores, and the memory usage tops out at around 12-13%. The issue isn't the database as the lag can happen on static resources. About 1 out of 10 page views gets a 502 error. About 1 in 5 pages takes 5-20 seconds to load. When looking at Chrome's network tab, it shows "waiting" for almost all of that time.
I rebooted the server last night, and it seemed okay for a few hours, but less than 12 hours later it was back to the normal lag issues. Anyone have any tips on where I can look to try and figure out the problem?