SNMP - Value of CPU processor load not reflecting reality

Question

Trying to plot CPU load on my server, with the following hardware: ProLiant DL360p Gen8 (same behavior on ProLiant DL360 G7).

The machine is running VMWare ESXi5.1

To create a CPU spike I run dd if=/dev/zero of=/dev/null, and I know the CPU is overloaded, because I can see a correlating spike in the graphs displayed on vCenter.

enter image description here

However, running this snmpwalk:

snmpwalk -v 1 -c ******** 192.168.MY_IP  1.3.6.1.2.1.25.3.3.1.2

Shows the following results:

iso.3.6.1.2.1.25.3.3.1.2.1 = INTEGER: 3
iso.3.6.1.2.1.25.3.3.1.2.2 = INTEGER: 2
iso.3.6.1.2.1.25.3.3.1.2.3 = INTEGER: 2
iso.3.6.1.2.1.25.3.3.1.2.4 = INTEGER: 3

Am I not looking into the right MIB? Should I be multiplying these by a constant?

By the way, using HP Agentless Monitoring I was able to get some cpu stats, but not what I'm looking for, at least nothing I could find wading through these MIBs.

from the snmp walk it's shown you have 4 processor, on your graph i only see 1 graph, how do you create the graph ? (based on which oid ?). your graph using % which i believe the snmp result not in %. — chocripple, Dec 10 '12 at 09:50
This is the graph shown for the machine on vCenter, under "Performance" in the "Monitoring" tab. It doesn't specify what exactly it's showing, but there's a specific correlation between when I start the CPU-intensive process and the spike in the graph — Ovesh, Dec 10 '12 at 12:12
so i believe Esxi monitoring not came from snmp. it's internal Esxi. you can't compare it directly with snmp walk result (cmiiw) — chocripple, Dec 11 '12 at 01:32
@Rikih OK, I can't compare it directly. But there should be at least _some_ correlation between them. There is no change in the CPU values I'm getting at all, in any of them. — Ovesh, Dec 11 '12 at 05:28
25% out of total. 4 cpu fully busy would show up as 100%, dd makes use of only one core, this graph shows that as 25%. — erkko, Dec 17 '12 at 06:40
@erkko Look at the values I'm posting. They are all very low numbers, not anywhere near 25% nor 100%. That's what my question is about. — Ovesh, Dec 18 '12 at 02:25

ewwhite · Answer 1 · 2012-12-10T12:34:55.740

5

Try using the the stress utility to generate load in Linux, please. It's very granular and makes more sense than what you're doing.

What I see you doing is generating a single-threaded I/O load on a 4-CPU virtual machine. The CPU graph you pasted-in from the vSphere client shows a 25% load because you're only straining one of the four CPU's assigned to the virtual machine.

Download stress (which is available for most Linux distributions) and try with some specific parameters...

For instance, simply running the following on a 4-CPU virtual machine:

# stress -c 4
stress: info: [594013] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd

yields...

enter image description here

edited Dec 10 '12 at 12:34

answered Dec 10 '12 at 10:52

ewwhite

197,159
92
443
809

However, there is a definite correlation between when I start the CPU-intensive process and the spike in the graph. So why don't I see any change in the metrics of any of the 4 CPUs? – Ovesh Dec 10 '12 at 12:13
What are you running `snmpwalk` against? It's not very clear. – ewwhite Dec 10 '12 at 12:21
Sorry, should have made it clearer. It's running against the SNMP agent running on the machine itself, not against the iLO. The iLO itself (as far as I could see) doesn't offer any CPU load data. – Ovesh Dec 10 '12 at 12:38
@Ovesh SNMP on what machine? The virtual machine or the ESXi? – pauska Dec 10 '12 at 15:26
the ESXi itself – Ovesh Dec 10 '12 at 21:47
@Ovesh That does not make sense. You should be querying your guest OS's SNMP stack. – ewwhite Dec 10 '12 at 21:50
But I need to plot performance of the physical machine. It might not have any guests on it at a given time. Why does that not make sense? – Ovesh Dec 11 '12 at 00:07
Well, you can get this information out of vCenter. Are you using a licensed version of VMWare, or are you on the free version of ESXi? Also, what software will you be using to generate your graphs? – ewwhite Dec 11 '12 at 22:42
Licensed version of VMWare. I'll be using RRD to generate graphs. How do I access the data programatically? – Ovesh Dec 18 '12 at 02:24

score 1 · Answer 2 · answered Dec 10 '12 at 11:29

Vmware doesn't collect this information, and there's really no good way for it to do so. The problem is that it has no way to know when you're going to ask -- so in order to make this work, it would have to always have an average ready for the past sixty seconds. Since you might ask now and then ask a second later, it would have to properly count the CPU time forty seconds ago towards both intervals. That's a really ugly, complicated thing to do.

Supporting this would add a high cost as the SNMP agent would have to constantly probe the CPU usage and update multiple intervals all running at the same time.

So what does this chart actually show? – Ovesh Dec 10 '12 at 12:04 — Ovesh, Dec 10 '12 at 12:04

SNMP - Value of CPU processor load not reflecting reality

2 Answers2