-1

I am trying to test the performance difference between QEMU-KVM and host machine and could not understand how optimize my QEMU-KVM to achieve near native performance.

I installed QEMU-KVM with lubuntu 14.04 and running a stress microbenchmark that does not produce any cache misses.

I am recording the performance counter, instructions retired, using perf monitoring tool.

Since QEMU does not provide with this performance counter. I am recording the performance of the entire QEMU process using perf from the host system.

The results obtained do not reflect that. I am not sure entirely how I should setup the QEMU-KVM subsystem.

I describe details of QEMU and the host machine (bare metal) below.

QEMU emulator version 2.0.0 in conjunction with KVM as the virtual environment and libvirt 1.2.2

The guest machine is running kernel version 3.19.0-15-generic and the host machine is running version 3.14.5-031405-generic on a x86_64 machine

guest machine with Intel SandyBridge processor (model name:Intel Xeon E312xx) with the following flags: sockets=1,cores=1,threads=1 and 4mb cache.
More details:
cpu family    : 6
model        : 42
max freq        : 2394.560 MHz
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm vnmi ept xsaveopt


The host machine is an Intel Sandy Bridge processor (Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz) with 4 cores and 6mb cache.
cpu family    : 6
model        : 42
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

Let me know if I can provide any more details.

Thanks!

tandem
  • 2,040
  • 4
  • 25
  • 52
  • So what's exactly your questions? you don't see more retired instructions in the virtual machines? have you looked at the "unhalted cycles" event? – Simple.guy Jul 08 '15 at 15:17
  • I believe the problem is more likely that it is a part of the kernel and `perfmon` does not allow me to collect kernel level statistics, whereas `perf` does. – tandem Jul 09 '15 at 19:25

1 Answers1

0

Try using event modifiers, as explained in "perf help list":

EVENT MODIFIERS
       Events can optionally have a modifer by appending a colon and one or more
       modifiers. Modifiers allow the user to restrict the events to be counted. The
       following modifiers exist:

           u - user-space counting
           k - kernel counting
           h - hypervisor counting
           G - guest counting (in KVM guests)
           H - host counting (not in KVM guests)
           p - precise level
           S - read sample value (PERF_SAMPLE_READ)
           D - pin the event to the PMU

The documentation above is not satisfactory IMHO, and I don't fully understand the meaning of each event modifier. Sometimes I get inconsistent results, e.g. more retired instructions in user-space counting ("u" modifier) than in full counting (no modifiers at all).

I'm using the following perf version:

>> perf --version
perf version 3.13.11-ckt20
Simple.guy
  • 362
  • 3
  • 15