3

Whenever I try to add up the CPU utilization percentages from commands like top or mpstat and in particular the collectd service, I can't get to the exact 100% CPU utilization.

For example top results from a test server on Amazon EC2:

Cpu(s): 13.6%us, 31.6%sy,  0.0%ni, 53.2%id,  0.0%wa,  0.0%hi,  0.0%si,  1.7%st

No matter how I add up the percentages, I never quite get 100% CPU, certainly not in any logical way. Mostly it seems like rounding errors; 100.1% or 99.9%, but sometimes I end up with over 110%. This usually happens when steal is relatively high, e.g. one situation from collectd reported ~21.44% steal and ~88% idle, just those two are well over 100% already. I understand the ni (nice) is also counted in us (user), so I shouldn't add it, but that still doesn't work out.

Does anybody know how to add these up to 100% or how to interpret the exceptional cases that collectd sometimes reports?

Martijn
  • 3,696
  • 2
  • 38
  • 64
  • `top` doesn't exactly report CPU usage in terms of how much of the CPU is being utilized, it's indicating how much of a single CPU would be necessary for running all the processes. You can have greater than 100%. See threads here: http://superuser.com/questions/174660/why-is-the-cpu-usage-reported-by-top-in-linux-over-100 and http://serverfault.com/questions/127059/using-top-4-processes-have-100-cpu-how – wkl Aug 15 '12 at 21:18
  • @birryree You are right ofcourse. If you had two cores, total CPU% in `top` might go to 200%. I forgot to mention the VPS this is tested this only has one core. In `collectd` however, statistics are split out per individual core. – Martijn Aug 15 '12 at 21:41

2 Answers2

5

collectd (and top, htop, vmstat or any other such utility) reports an average over an interval, and by nature of the kernel (from which these utilities query their statistics) not generally using floating point math and not necessarily trying to exhaustively account for everything, can't be 100% accurate. Sometimes it'll all add up to something less than 100%, sometimes more. It's not intended to be used for an audit, just a general indication of where time is being spent.

twalberg
  • 59,951
  • 11
  • 89
  • 84
1

I confirm that this has nothing to do with collectd, but with kernel accounting. This inaccuracy is particularily substancial on tickless systems, and/or throttling states.

faxmodem
  • 430
  • 3
  • 12