2

I'm running some fairly processor-intensive stuff on my PC, and notice my CPU usage looks pretty odd. My PC is a quad-core i7-870, which supposedly has eight virtual cores.
I'm using the Task Parallel library in .NET 4, so expect all cores to be nicely utilised, but am getting information like this from Process Monitor:

CPU usage

Cores 6 and 8 are hardly getting touched, and apart from a brief burst, 4 isn't either.
Is this what I should expect?

DefenestrationDay
  • 3,712
  • 2
  • 33
  • 61
  • computers generally operate as intended so you should be looking into what it is about your algorithm that doesn't scale. You didn't say what your algorithm was. Even if you ignore the virtual processors then your CPU utilisation is very poor. – David Heffernan Jun 10 '11 at 06:00
  • 2
    (a) yes (b) profile (c) YMMV - hyperthreading is not multi-coring and as such depends heavily on the type of instruction load and cache saturation etc. – sehe Jun 10 '11 at 06:01
  • @David - by 'poor', you mean 'low'? Even with an average 60% utilisation, I would have (naively, perhaps?) expected the OS to share that out a little better amongst the virtual cores.. – DefenestrationDay Jun 10 '11 at 06:09
  • 1
    @Cap: Affinity keeps threads running on the same core. If you're not using all cores that's a **good** thing. – Rick Sladkey Jun 10 '11 at 06:14
  • OS will prefer to run threads on the processor that last ran the thread for better cache performance. Since your utilisation is so low this means some processors are ignored. – David Heffernan Jun 10 '11 at 06:15
  • I think you are suffering from contention on a lock. Does that sound plausible? – David Heffernan Jun 10 '11 at 06:16
  • @David: Plausible - sure: I'll look into it. I was expecting 100% for this process, actually. It just surprised me with the load (un)balancing.. – DefenestrationDay Jun 10 '11 at 06:20
  • @CapsicumDreams: there's nothing to be gained by sharing it out between all cores. It's often more efficient to run one core at 100% than two cores at 50% each (depends on the exact cache behavior of your code) – jalf Jun 10 '11 at 06:20
  • 1
    @Cap: Another to measure is your GC load. Even with background GC of .NET4, if your GC load is too high, threads will block and you'll never achieve full core utilization. – Rick Sladkey Jun 10 '11 at 06:31

3 Answers3

3

For the most part, yes, I think this looks reasonable. Keep in mind that hyperthreading really just fakes two cores. Each physical core is given two frontends, so it can read two streams of instructions in parallel. But they still share the same execution units. So when one HT core is busy, the execution units are taken, and so its "twin" core will be able to do very little work.

That seems to be what you're seeing on the first two cores (the second in particular makes it very obvious)

Apart from this, you'll almost never be able to get perfect CPU utilization. Sometimes, a core just has to stall waiting for memory. Sometimes it's executing a costly non-pipelined instruction, effectively blocking the execution units on that physical core for perhaps tens or even hundreds of cycles.

And sometimes, dependencies between instructions might just mean that you don't have anything for one or more cores to execute.

Apart from that, you see 8 graphs, and you only have 4 cores, so yes, of course hyperthreading is working. ;)

jalf
  • 243,077
  • 51
  • 345
  • 550
  • when it waits for memory that's counted as CPU time so you still can get 100% utilisation even if your memory usage is terrible – David Heffernan Jun 10 '11 at 06:13
  • actually for many algorithms it's easy to get 100% utilisation – David Heffernan Jun 10 '11 at 06:17
  • Fair enough. But even if it's just inter-thread dependencies, that'll still explain why the OP doesn't see 100% utilization on all cores :) – jalf Jun 10 '11 at 06:18
  • there has to be locking to see this. Memory contention and stalling will cripple runtimes but show up as full cpu utilisation. – David Heffernan Jun 10 '11 at 06:21
  • @David: but since we don't know which algorithm the OP is running, nor how it is executed..... I'm going to go ahead and state that his algorithm, or the specific implementation of it, obviously does not scale to use 8 cores perfectly. ;) – jalf Jun 10 '11 at 06:21
  • is I commented some time ago ;) – David Heffernan Jun 10 '11 at 06:24
0

In short

  1. yes it works (of course)
  2. profile it
  3. YMMV - hyperthreading is not multi-coring and as such depends heavily on the type of instruction load and cache saturation etc. Not knowing anything about your code (except that it is C#, really) you might look for collections of 'small objects' that could be made into straight System.Array's of structs (A generic List<> will also use an Array internally and optimize for struct element types)

$0.02

sehe
  • 374,641
  • 47
  • 450
  • 633
  • it's going to be more complex than your item 3. All that red in the task manager is kernel time. Utilisation is dreadful. Optimisations based on struct or arrays won't solve fundamental problem which looks very like lock contention. – David Heffernan Jun 10 '11 at 06:07
  • Not sure about more compelx (could be an easy thing) but even with saturation you should e more equal load dsitribution. CPU time would get wasted (not doing anything) but still show high utilization. – TomTom Jun 10 '11 at 06:09
  • The most funny thing is that it seems some cores are totally ignored. Wven with 2-3 threas it would move them between cores at times. – TomTom Jun 10 '11 at 06:10
  • 1
    @tom that's not really odd at all, it's exactly what you would expect – David Heffernan Jun 10 '11 at 06:25
0

It all depends on your algorithm implementation. TPL will use proper number of cores depending upon the data dependency in your algorithm

Ankur
  • 33,367
  • 2
  • 46
  • 72