Experienced strange rdtsc behavior comparing physical hardware and kvm-based VMs

Question

I have a following problem. I run several stress tests on a Linux machine

$ uname -a
Linux debian 3.14-2-686-pae #1 SMP Debian 3.14.15-2 (2014-08-09) i686 GNU/Linux

It's an Intel i5 Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 8 G RAM, 300 G HDD.

These tests are not I/O intensive, I mostly compute double arithmetic in the following way:

start = rdtsc();
do_arithmetic();
stop = rdtsc();
diff = stop - start;

I repeat these tests many times, running my benchmarking application on a physical machine or on a KVM based VM:

qemu-system-i386 disk.img -m 2000 -device virtio-net-pci,netdev=net1,mac=52:54:00:12:34:03 -netdev type=tap,id=net1,ifname=tap0,script=no,downscript=no -cpu host,+vmx -enable-kvm -nographichere

I collect data statistics (i.e., diffs) for many trials. For the physical machine (not loaded), I get the data distribution of processing delay mostly likely to be a very narrow lognormal.

When I repeat the experiment on the virtual machine (physical and virtual machines are not loaded), the lognormal distribution is still there (of a little bit wider shape), however, I collect a few points with completion times much shorter (about two times) than the absolute minimum gathered for the physical machine!! (Notice that the completion time distribution on the physical machine is very narrow lying close to the min value). Also there are some points with completion times much longer than the average completion time on the hardware machine.

I guess that my rdtsc benchmarking method is not very accurate for the VM environment. Can you please suggest a method to improve my benchmarking system that could provide reliable (comparable) statistics between the physical and the kvm-based virtual environment? At least something, that won't show me that the VM is 2x faster than a hardware PC in a small number of cases.

Thanks in advance for any suggestions or comments on this subject.

Best regards

score 0 · Answer 1 · answered Sep 11 '14 at 01:43

0

maybe you can try clock_gettime(CLOCK_THREAD_CPUTIME_ID,&ts),see man clock_gettime for more information

answered Sep 11 '14 at 01:43

ioilala

277
2
10

I don't suspect any improvement as the clock_gettime function with the CLOCK_THREAD_CPUTIME_ID argument is built upon TSC on Intel anyways. – Eryk Sep 11 '14 at 07:11

score 0 · Answer 2 · answered Sep 12 '14 at 11:53

It seems that it's not the problem of rdtsc at all. I am using my i5 Intel core with a fixed limited frequency through the acpi_cpufreq driver with the userspace governor. Even though the CPU speed is fixed at let's say 2.4 G (out of 3.3G), there are some calculations performed with the maximum speed of 3.3 G. Roughly speaking, I also encountered a very small number of such cases on the physical machine ~1 per 10000. On kvm, this behavior is of higher frequency, let's say about a few percent. I will further investigate this problem.

Experienced strange rdtsc behavior comparing physical hardware and kvm-based VMs

2 Answers2