-2

From where the cpu capabilites are exposed to libvirt VM's? We are able the see different cpu capabilities from different sources. We created a VM with cpu mode 'host-passthrough' which didn't get all flags that shows in physical node lscpu flags. Why there are different cpu flag sets in /usr/share/libvirt/cpu_map/ and "virsh capabilities"

Flag count is 133

root@physical_node:~# lscpu |grep Flags Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

Flags count is 72

root@physical_node:~# grep -oP "(?<=feature name=['"]).*(?=['"])" /usr/share/libvirt/cpu_map/x86_Cascadelake-Server-noTSX.xml |xargs 3dnowprefetch abm adx aes apic arat avx avx2 avx512bw avx512cd avx512dq avx512f avx512vl avx512vnni bmi1 bmi2 clflush clflushopt clwb cmov cx16 cx8 de erms f16c fma fpu fsgsbase fxsr invpcid lahf_lm lm mca mce mmx movbe mpx msr mtrr nx pae pat pcid pclmuldq pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdtscp sep smap smep spec-ctrl ssbd sse sse2 sse4.1 sse4.2 ssse3 syscall tsc tsc-deadline vme x2apic xgetbv1 xsave xsavec xsaveopt

Flags count is 35

root@physical_node:~# virsh capabilities ds acpi ss ht tm pbe dtes64 monitor ds_cpl vmx smx est tm2 xtpr pdcm dca osxsave tsc_adjust cmt intel-pt pku ospke md-clear stibp arch-capabilities xsaves mbm_total mbm_local invtsc rdctl-no ibrs-all skip-l1dfl-vmentry mds-no pschange-mc-no tsx-ctrl

A VM which deployed with

Flags count is 99 in vm.

root@vm-1:~# lscpu |grep Flags Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat pku ospke avx512_vnni md_clear arch_capabilities

Would like to know for my better understanding.

1 Answers1

1

Prepare to hear tales of terrible hardship, endurance and woe....

The CPU handling in QEMU and libvirt is rather complicated, non-obvious and with historical baggage that misleads people.

The lscpu command on your host is accurately reporting what the host OS is able to see from the real CPU.

The QEMU host-passthrough CPU model is intended to pass through all features of the host CPU, but in practice this is not an exact science. There are some features that are not virtualizable, or are intentionally blocked by the KVM kernel module, and so won't appear in the guest. There are some features which can be exposed to the guest even if the host doesn't have them. So host-passthrough CPU model in QEMU will get you a guest that is pretty close to what lscpu tells you on the host, but will likely still have a small set of differences.

The /usr/share/libvirt/cpu_map/ XML files are only relevant if using a named CPU model, NOT host-passthrough. The files define libvirt's view of how QEMU originally modelled each named CPU. The problem is that QEMU's view of named CPUs changes over time, either because bugs were found in QEMU's definition, or because hardware vendors issued a microcode update that added/removed flags. With a modern version of QEMU, the set of features listed in these XML files are no longer used when running guests, and instead libvirt will talk to QEMU to discover the fully up2date set of features. So the set of features you see in these XML files is almost always different from what the guest sees.

The features in the XML file are, however, still used when you run virsh capabilities. For this command libvirt will look at the host CPU and try to find one of its named CPUs that is a close match. The features reported by virsh capbilities, however, will NOT match lscpu on the host, because libvirt does not expand the full featureset in that context. If virsh capabilities reports CascadeLake, it will only list features that are different between libvirt's CascadeLake.xml file and your actual host. To see the full feature set that should match lscpu from host would require telling libvirt to expand the feature list using virsh baseline-cpu --features. If you want to see the full feature set matching lscpu from the guest you'd instead need virsh hypervisor-cpu-baseline --features

eg

virsh capabilities --xpath //host/cpu > host-cpu.xml
virsh cpu-baseline -features host-cpu.xml > expanded-host-cpu.xml
virsh hypervisor-cpu-baseline -features host-cpu.xml > expanded-guest-cpu.xml

Even if you do this though, the set in guest-cpu.xml still potentially doesn't quite match lscpu, because even with host-passthrough some features might be intentionally skipped because they block migration.

Finally the 'lscpu' command uses different names for some features than libvirt and QEMU do which makes comparisons "interesting".

Overall the message is that comparing features between all these different sources is is incredibly painful and confusing :-(

DanielB
  • 2,461
  • 1
  • 10
  • 13