CPU time percentage algorithm problems

Question

I'm having issues with a Java algorithm I created to convert nanosecond CPU time usage (obtained via JMX) to a percentage out of 100%. The algorithm appears to be giving numbers greater than 100%, which I assume to be due to multiple available processors, although the code should sort this out. The algorithm can be seen below. cpuTimeDiffNS is the amount of CPU time used in nanoseconds, while periodMS is the sampled period.

public static final double getCPUPerc(long cpuTimeDiffNS, long periodMS) {
    if (periodMS == 0) return 0;
    double cpuTimeDiffMS = cpuTimeDiffNS / 1000000d;
    int procs = Runtime.getRuntime().availableProcessors();
    long availableTime = periodMS * procs;
    double perc = cpuTimeDiffMS / availableTime;
    return perc * 100d;
}

Here's some samples from the data acquisition:

0
87.5
133.8288232
160.8231707
197.7896341
209.6036585
248.822774
274.3902439
267.9115854
271.3414634
277.1067759
283.1554878
272.1036585
279.4000734
283.9176829
283.5365854
275.9146341
282.4578033
278.9634146
261.0536937
254.6071775
286.662182
278.9634146
276.7245597
288.4908537
281.6933708
286.9664634
279.7822896
276.2957317
280.4878049
275.5335366
271.7557485
280.8689024
287.2689689
281.6933708
267.5097276
273.2469512
286.1735835
289.6341463
296.875
279.4000734
289.2530488
282.8400196
288.4908537
287.4266145
288.1097561
286.5853659
288.9554795
238.1207192
288.4908537
288.7063531
290.3963415
286.662182
277.4390244
290.4843444
281.6310976
271.7557485
272.8658537
283.2222358
250.7621951

Edit: as requested, the input gathering functions (you can probably ignore this):

// returns CPU time in NS for a thread group (recursively)
public static long getCPUTime(ThreadGroup tg) {
    synchronized (TGLOCK) {
        int size;
        do {
            size = tg.enumerate(tgThreads, true);
            if (size <= tgThreads.length) continue;
            tgThreads = new Thread[size];
        } while (size > tgThreads.length);

        long totalTime = 0;
        for (int i = 0; i < size; i++) {
            totalTime += getCPUTime(tgThreads[i]);
        }
        return totalTime;
    }
}

public static long getCPUTime(Thread t) {
    return threadMXBean.getThreadCpuTime(t.getId());
}

public static ThreadGroup getRootThreadGroup() {
    // Find the root thread group
    ThreadGroup root = Thread.currentThread().getThreadGroup().getParent();
    while (root.getParent() != null) {
        root = root.getParent();
    }
    return root;
}

and the inputs (again, you can probably ignore this):

    simCPUTimeNS     = getCPUTime(kks.getSimThreadGroup());
    appsCPUTimeNS    = getCPUTime(kks.getAppThreadGroup());
    lwjns3CPUTimeNS  = getCPUTime(kks.getKKSThreadGroup());
    simCoreCPUTimeNS = getCPUTime(kks.getSimThread());
    totalCPUTimeNS   = getCPUTime(getRootThreadGroup());

    simCPUTimeNSDiff  = simCPUTimeNS - lastSimCPUTimeNS;
    appsCPUTimeNSDiff = appsCPUTimeNS - lastAppsCPUTimeNS;
    lwjns3CPUTimeNSDiff = lwjns3CPUTimeNS - lastLwjns3CPUTimeNS;
    simCoreCPUTimeNSDiff = simCoreCPUTimeNS - lastSimCoreCPUTimeNS;
    totalCPUTimeNSDiff = totalCPUTimeNS - lastTotalCPUTimeNS;

    lastSimCPUTimeNS     = simCPUTimeNS;
    lastAppsCPUTimeNS    = appsCPUTimeNS;
    lastLwjns3CPUTimeNS  = lwjns3CPUTimeNS;
    lastSimCoreCPUTimeNS = simCoreCPUTimeNS;
    lastTotalCPUTimeNS   = totalCPUTimeNS;

    simCPUPerc     = getCPUPerc(simCPUTimeNSDiff, currDiffMS);
    appsCPUPerc    = getCPUPerc(appsCPUTimeNSDiff, currDiffMS);
    lwjns3CPUPerc  = getCPUPerc(lwjns3CPUTimeNSDiff, currDiffMS);
    simCoreCPUPerc = getCPUPerc(simCoreCPUTimeNSDiff, currDiffMS);
    totalCPUPerc   = getCPUPerc(totalCPUTimeNSDiff, currDiffMS);

Cheers for any help, I'm sure the answer is obvious ;)
Chris

Hope this post can help you better:http://stackoverflow.com/questions/47177/how-to-monitor-the-computers-cpu-memory-and-disk-usage-in-java — James Jithin, Oct 21 '11 at 18:07
What JMX call and calculations are you using to get the `cpuTimeDiffNS`? — Gray, Oct 21 '11 at 19:31
Typically when I'm having trouble like this, I'd have it emit each of the intermediate values you're calculcating. For instance, are your inputs what you expect? Is cpuTimeDiffMS being computed correctly? procs? availableTime? the first incarnation of perc? Or is it only the returned value that is off? — Patrick87, Oct 21 '11 at 19:48

Gray · Answer 1 · 2011-10-23T02:09:36.833

1

So we use similar code to calculate the load average and (as it turns out) it has a bug that may also be in your code. We use getAllThreadIds() but it only returns the "active" threads and enumerate does this also. If any of your threads stop then the total cpu time may go down. I don't see how this would cause the the values to go above 100%.

Couple comments about your code:

Why the synchronized (TGLOCK) line? Is this to synchronize the ThreadGroup object?
if (size <= tgThreads.length) continue; should be a break;. There's no need to double test.
enumerate returns the number of threads put in the array. This is always going to be <= tg.length so the array will never grow if I'm readying it right. If it did return a larger size then you'd be getting NPEs since you have tgThreads = new Thread[size]; right before the while check which will never be true.
Is there a reason you are using the ThreadGroup at all? We are using the following which doesn't need to recurse or anything:
```
for (long id : threadMxBean.getAllThreadIds()) {
    long cpuTime = threadMxBean.getThreadCpuTime(id);
```

Hope this helps at least somewhat.

edited Oct 23 '11 at 02:09

answered Oct 22 '11 at 02:16

Gray

115,027
24
293
354

I'm using ThreadGroup because my threads are organised neatly into groups, and I need to enumerate the properties of the threads within these groups. The `synchronized` block is because of the `tgThreads` static field, which I wanted to protect access to that. Double testing is necessary because `tg.enumerate(..)` returns a size which may be smaller than the passed array, in which case you need to create a new array with the given size and pass that, so that elements don't drop off the end. Maybe a do { } while { } isn't needed, but it doesn't matter much. – Chris Dennett Oct 22 '11 at 18:35
I don't think you quite understand. First you are doing `size <= tgThreads.length` and the calling `continue` which takes you to `while (size > tgThreads.length)`. That's an unnecessary test since it will always be false. – Gray Oct 23 '11 at 01:38
And the while () loop seems extraneous because it will never be true. size will always be == `tgThreads.length` because of the new right before. – Gray Oct 23 '11 at 01:39
It'll return when size <= tgThreads.length, which is intended. The method keeps going until it gets an array until it has an array which has enough room to fit all the threads, or it'll return on the first iteration with an array which is large enough already, and the size is stored so that threads from previous calls aren't iterated over. That code returns fine, there's no issues with it :) – Chris Dennett Oct 23 '11 at 01:53
1

Ah I see why it works. Read the javadocs for `enumerate`. It never returns a size > tg.length. Regardless of what you say dude, there are bugs in that loop. But I'll agree that they don't explain the > 100%. I bet if you print out the size of the array, it's the same as it was when it was initially created. – Gray Oct 23 '11 at 02:07

score 0 · Accepted Answer · answered Oct 24 '11 at 02:12

I'm running on a time-dilated JVM (5x slowdown), and it looks like I forgot to dilate in one part of the C++ JVM code in os_windows.cpp (os::thread_cpu_time) when making adjustments. Oops. It uses timeGetTime(), a Windows time function. That'd explain it.

CPU time percentage algorithm problems

2 Answers2