0

We're running into an interesting conundrum that I'd appreciate some help troubleshooting. We have a service that has several processes. To distribute load, we can startup n-processes of most types. So for example, if we expect 200,000 connections and know that each of a certain process type can handle around 5,000 connections before pegging out at 100% CPU, we know we should have at least 40 of these process types running to handle the load.

Recently, we've started consolidating our services to make better use of our hardware. During load testing though, we've seen that changing nothing other than how many of a certain process type are on a single box doubles the CPU% of each process.

Here's a screenshot of the process CPU%:

Process CPU%

Here's a screenshot of the host CPU%:

Host CPU%

The test from earlier had about 12 instances of this process on it; the test from later doubled the count. I'd say this would make sense if the box just couldn't handle the load, but from what I see it doesn't look like the case.

top - 14:55:08 up 54 days, 18:30,  1 user,  load average: 22.26, 22.39, 22.03
Tasks: 581 total,   1 running, 580 sleeping,   0 stopped,   0 zombie
%Cpu(s): 32.8 us,  3.1 sy,  0.0 ni, 62.3 id,  0.0 wa,  0.0 hi,  1.7 si,  0.0 st
KiB Mem : 26385841+total, 16612808+free, 20537016 used, 77193320 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 24167782+avail Mem

Load average is within range (this is a 28-core server, 256GB of memory). Disk I/O has a wa of 0.0. I'm not sure what's causing the increased CPU%. Any ideas on what else to look for? Why does doubling the count of processes also double the amount of CPU time required for each process, if the CPU (according to top) is actually under utilized?

MrDuk
  • 865
  • 2
  • 10
  • 18
  • One process can only utilize one core at a given time and running 12 processes on a 28-core server would lead to ~42% CPU utilization. Plus overhead from sys-time or similar. Doubling the processes gives the ability to utilize more CPU cores and the graph would make sense. You also should investigate the usage with maybe `htop` which can give a better overview of the single processes/threads and utilization. – Thomas Jul 12 '17 at 17:14
  • "running 12 processes on a 28-core server would lead to ~42% CPU utilization." -- I don't think that's how it works. The CPU% of my process isn't the same thing as the overall usage of the CPU itself (unless the processes are pegged at 100%, then it'd match up pretty close). http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages – MrDuk Jul 12 '17 at 17:36
  • So your graph does not show the overall CPU utilization? What does it reflect? – Thomas Jul 12 '17 at 17:38
  • The CPU% of the processes themselves, sorry if that wasn't clear. I've updated my question with another screenshot. – MrDuk Jul 12 '17 at 17:38
  • Yes, I got your question wrong. I guess one would have deeper insight of the application to tell what happens. Maybe investigating with `perf top` running 12 processes and 24 processes gives you a hint. – Thomas Jul 12 '17 at 18:16

0 Answers0