0

I like to fully-load our compute hardware to reduce wasted CPU time, and on typical in-house hardware this is fairly easy: load the machine with as many runnable threads as there are cores, and idle time will go to zero.

Here is an example app:

public class Looper
{
    public static void main(String[] args)
    {
        while (true) { new java.util.Random().nextBytes(new byte[4096]); }
    }
}

On our in-house, 8-core hardware, I can run 8 of these and idle time (as reported by mpstat and top) goes to zero. I can even add a 9th, 10th, etc process, and idle time stays very close to zero.

On EC2 (c1.xlarge instances), however, idle time is much higher than I'd expect. At 8 processes, idle time hovers around %1, and with 9, 10, etc processes, it can increase to 2%-3% or higher. With more complicated programs (not the example above), idle time can be even higher than that.

Can anyone explain this? This is with very recent Amazon kernels, and does not include stolen CPU time, which I would expect to see on EC2. Is this a problem with EC2 in particular, or is it general to Xen? Are there known workarounds?

plinehan
  • 675
  • 1
  • 5
  • 6
  • I'm just curious, why do you like to fully-load your hardware? How does that reduce wasted CPU time? – Rob Olmos Sep 09 '10 at 22:47
  • It sounds like "wasting CPU time" is what you're doing. Could you explain a bit more why you're doing what you're doing and why you're expecting to see these results? – Philip Reynolds Sep 09 '10 at 23:01
  • For the purposes of this discussion, let's say I'm running a render farm. My workload is CPU bound, and I'd like to pay for exactly as many CPU hours as I'll actually use. Any time spent idling is paying for CPU that I'm not using. Because my load isn't variable (like e.g. web serving), I don't have to worry about having excess capacity/slack to handle workload spikes. In reality, I don't mind the load being somewhat less than 100%, but in our experience it can be as low as 80-90%. I created the particular toy problem above just for demonstration purposes. – plinehan Sep 10 '10 at 18:52

1 Answers1

1

Commonly with EC2 the idle and steal values appear higher than you would see on bare metal. This is normal on EC2 due to how the virtualization works. You are probably not losing available CPU time in this case as it's just an artifact of how the system functions. Make sure when you're checking the CPU utilization you're using a Xen-aware version of the tool that understands how to identify CPU time on a Xen-based VM.

Nathan V
  • 711
  • 5
  • 16