Sysbench on same level CPU, why Google VM is much faster than AWS ec2?

Question

sysbench --test=cpu --cpu-max-prime=20000 run

Google VM:
standard 1 cpu(asia-east1), E5 v2 2.5G,
execution time: ~28 seconds

standard 2 cpu(asia-east1), E5 v2 2.5G,
execution time: ~28 seconds

AWS ec2:
m3.medium, E5 v2 2.5G,
execution time: ~59 seconds

m3.large, E5 v2 2.5G,
execution time: ~28 seconds

Editted:
Just now I tested two m3.medium instances and two m3.large instances, and found that only m3.medium is slow. All m3.medium instances I tested are slow(~59 seconds).

Same family of CPU, but different hardware, different OS, different environment, possibly different versions of sysbench. m3 instances are previous generation, m4 are E5 but v3 or v4. This isn't exactly a well defined test. — Tim, Apr 28 '17 at 19:49
@Tim Same OS, same environment(all brand new), same version of sysbench. I know there is a m4, but I'm not comparing aws vs google. I just curious about the significant performance difference between the same family of CPUs. — Elect2, Apr 28 '17 at 20:09
I don't think anyone can really answer your question. I ran a c4.large and got a total time of 25.02s. If you want to do a more comprehensive comparison there's an interesting article [here](http://dtrace.org/blogs/brendan/2014/01/10/benchmarking-the-cloud/). Someone has probably already done it. — Tim, Apr 28 '17 at 21:13
@Tim I have explained that I'm not trying to figure out which cloud platform is faster. I'm asking what affected the CPU performance so much. I clearly described my question in the Title. — Elect2, Apr 29 '17 at 07:33

score 2 · Answer 1 · answered Apr 30 '17 at 20:04

2

On most small/medium instances, AWS aggressively time-shares CPU time between multiple virtual machines. This means that any process that appears "running" from the guest side can really be "suspended/waiting" on host side, lowering total performance.

The others cloud providers seems to provide even small instances with somewhat lower time-sharing: for example, from a small Azure machine I got much faster CPU performance than a similar AWS instance.

However VMS provisioning/sizing can be quite complex, which many options to consider. For example, when an AWS machine is idling, it collect "CPU credits" for fast and short burts. For more information, give a look here.

answered Apr 30 '17 at 20:04

shodanshok

47,711
7
111
180

2

CPU credits apply to T2 instance types only, the test was done with an M3 instance, so they don't apply. The M3 gives full access to a single core, though I don't know if it's a dedicated core or just scheduled access that aggregates to 100%. I would be surprised if the M series oversells CPU capacity, but I would somewhat expect it with the T series. "Noisy Neighbour" is a possible issue, and [CPU stealing](https://www.datadoghq.com/blog/understanding-aws-stolen-cpu-and-how-it-affects-your-apps/). Neither are consistent so they don't answer the question asked. – Tim Apr 30 '17 at 20:22
@Tim timesharing and/or cpu stealing are the only two things that can explain the lower performance, I suppose. This should be easy verifiable by the OP simply giveing a look at `iotop` when the benchmark runs. Thanks for the link. – shodanshok Apr 30 '17 at 20:36
I wasn't meaning those are the only two options, there will be many reasons performance is lower than expected. It'd be interested what a physical machine does. My Surface Pro 4, with Ubuntu in VirtualBox, got 28.9. Given the M4 gets around the same as Azure and a modern PC, I guess the question is why is the m3 so slow? – Tim Apr 30 '17 at 20:58

score 0 · Answer 2 · answered Apr 30 '17 at 19:26

When you get a regular VM on most cloud provider you are not getting dedicated resources, otherwise you would be getting a dedicated server that is normally much more expensive. Of course this depends on the provider's implementation but in general there is always overcommitment and oversubscription. The higher the density (bigger hardware and more VMs) the better you can stabilize performance on individual VMs but it all depends on many factors like CPU scheduling algorithms, density, VM load balancing, etc.

Azure for example tries to guarantee a certain performance for different VM sizes but in reality it's very variant on many different factors, your VM is not running alone in the hardware, is running alongside many many others...

score 0 · Answer 3 · answered Apr 30 '17 at 22:37

Other Benchmarks

I found an interesting benchmark here on VPS Benchmarks. Note that they have unfair graphs that don't include 0 on the scale, so the graphs are pretty much useless. The numbers behind the tests seem fine.

Their test compares a AWS t2.small (1 core, 2GB RAM) with an GCE n1-standard-1. The t2 instances aren't a great comparison for the n1 standard, they have burstable CPU performance compared with GCE having constant CPU, but it's the only suitable test I can find.

The t2 instances are reputed to run on older AWS hardware (m1 generation), whereas the M3/M4 AWS instances which are newer. The GCE test was done a lot more recently as well.

Individual Tests

These all refer to the test linked above.

The CPU test is close, within 3%.

File IO random read isn't close at all. AWS has 24Mbps and GCE at 1787Mbps. I know that in AWS your I/O is closely related to your instance type, small instances get a lot less I/O than large instances. Given this huge discrepancy, and the other tests being roughly similar, I would want to see this retested before I trust the numbers. I have read that GCE does do very well for Network I/O. It could also be that the GCE test was done with local SSD and the AWS test done with network attached storage.

Other IO test are roughly similar. Sometimes AWS is higher, sometimes GCE is higher, but there's no clear winner.

Memory tests are roughly similar, with AWS edging out Google.

Notes

Any single test on any instance on any provider could come in low for a wide variety of reasons. Over-provisioned hardware, a noisy neighbour taking more than their share of resources, and CPU Stealing are just a few examples.

A good test would use a variety of tests (CPU, I/O, memory, etc), and would be run on at least three separate virtual machines.

Conclusion

AWS and GCE seem to perform roughly similar on these reasonably well documented test, even though instance types and hardware are quite different.

I would like to see @StanHou do significantly more robust, well documented tests to compare performance rather than rely on what could be a single test on a single instances.

T2 instance was announced in 2014 and is newest than m3. I have tested several T2 instances and their CPU are all e5 2676v3(same as m4). In GCE: v2 and v3 CPU performance almost the same. In AWS: v2(m3 instance) is 50% slower than v3(t2 or m4). That's the point of my question. — Elect2, May 01 '17 at 11:51
How do you find the CPU type of an instance? It's not in instance metadata. What I read suggested that t2 instances reused older hardware, as people migrate to newer instances. The main point of my comment is the single test you didn't doesn't mean all m3 instances are half the speed of that GCE instance, I'd want to see a few more tests. — Tim, May 01 '17 at 18:54
I get the cpu info from /proc/cpuinfo. Among t2.micro, m3.medium,m3.large, only m3.medium is "special". (I have updated my question details.). If regardless of m3.medium, there is no significant difference between the performance of ec2 and GCE. Maybe I should edit my question to remove the GCE part.. — Elect2, May 01 '17 at 19:46
That's interesting. I think this is worth asking on the AWS forums. You'd probably want to test maybe a few instances, a few tests of each, to show a pattern. I tested m4.large (2 cores) and using one core it was twice the speed. My t2.nano is "E5-2676 v3 @ 2.40GHz". — Tim, May 01 '17 at 20:01

Sysbench on same level CPU, why Google VM is much faster than AWS ec2?

3 Answers3