2

I have ASP.NET app (framework 4.8), which occasionally hits 100% CPU usage for periods of couple msec. It is essential to know, that during such CPU load or right before it app does not experience client RPS bursts. It actually is serving merely a couple of client requests prior to CPU usage bursts.

Viewing perfview dump with WPA graph CPU Usage (Sampled), I see that tops of CPU spike as well as spikes' slides are all filled up with CPU samples stacking up from Dequeue and TrySteal methods. Also system metrics show that during CPU load app experiences burst of used worker threads (ThreadPool.GetAvailableThreads - ThreadPool.GetMinThreads) up to number, I set with ThreadPool.SetMinThreads. Machine has 16 cores, so I tested app with values of 2048 and 512 workers per all cores: 128 and 32 workers per core accordigly.

As for now, it looks like CPU load is caused by large amount of worker threads, trying to pick up any work requests when available none. So workers waste CPU trying to find work requests at their local queues, global threadpool queue, and trying to steal work from other threads' local queues.

What might cause such bursts of worker threads amount? Can 16 CPU cores really be starved with 512 workers trying to find work or it is just a consequence of any kind of other problem?

Attachments illustrate

1) CPU samples distribution among all app threads' stacks

2) CPU samples distribution among single random app thread stack

CPU samples distribution among all app threads' stacks

CPU samples distribution among single random app thread stack

idementia
  • 851
  • 1
  • 6
  • 12
  • "Can 16 CPU cores really be starved with 512 workers" seems like so, especially in a highly complex environment like .net's vm. What is the point of having so many workers anyway? – freakish Nov 26 '19 at 08:47
  • The point is that we have used such modificator of 128 or 32 workers per core in many apps for years and it actually worked well before. Meaning each core was able to serve in parallell up to 128 workers pretty well. Even though our apps are considered to be high-load they seem to behave as fault tolerant. App above actually lived well through load of 200 RPS for quite some time. It is far from obvious why it started experience CPU usage issues. – idementia Nov 26 '19 at 09:05
  • You need to see the managed call stacks of the threads consuming CPU to get a better idea. – kvr Nov 11 '22 at 23:29

0 Answers0