When compiling programs to run inside a VM, what should march and mtune be set to?

Question

With VMs being slave to whatever the host machine is providing, what compiler flags should be provided to gcc?

I would normally think that -march=native would be what you would use when compiling for a dedicated box, but the fine detail that -march=native is going to as indicated in this article makes me extremely wary of using it.

So... what to set -march and -mtune to inside a VM?

For a specific example...

My specific case right now is compiling python (and more) in a linux guest inside a KVM-based "cloud" host that I have no real control over the host hardware (aside from 'simple' stuff like CPU GHz m CPU count, and available RAM). Currently, cpuinfo tells me I've got an "AMD Opteron(tm) Processor 6176" but I honestly don't know (yet) if that is reliable and whether the guest can get moved around to different architectures on me to meet the host's infrastructure shuffling needs (sounds hairy/unlikely).

All I can really guarantee is my OS, which is a 64-bit linux kernel where uname -m yields x86_64.

Presumably the [cpuid](http://linux.die.net/man/1/cpuid) information should be leading for `-march`. KVM can synthesize a restricted CPUID and then move around the image to any hardware which matches or exceeds that restricted CPUID. E.g. if KVM claims SSE2, it can shuffle the virtual machine to any SSE2-supporting CPU. `-mtune` is a bit of a guess, as you have no idea what CPU to tune for. — MSalters, Oct 12 '15 at 11:39

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

Some incomplete and out of order excerpts from section 3.17.14 Intel 386 and AMD x86-64 Options of the GCC 4.6.3 Standard C++ Library Manual (which I hope are pertinent).

-march=cpu-type
  Generate instructions for the machine type cpu-type.  
  The choices for cpu-type are the same as for -mtune.  
  Moreover, specifying -march=cpu-type implies -mtune=cpu-type. 

-mtune=cpu-type
  Tune to cpu-type everything applicable about the generated code,  
  except for the ABI and the set of available instructions.  
  The choices for cpu-type are:
    generic
      Produce code optimized for the most common IA32/AMD64/EM64T processors. 
    native
      This selects the CPU to tune for at compilation time by determining
      the processor type of the compiling machine. 
      Using -mtune=native will produce code optimized for the local machine
      under the constraints of the selected instruction set.
      Using -march=native will enable all instruction subsets supported by
      the local machine (hence the result might not run on different machines).

What I found most interesting is that specifying -march=cpu-type implies -mtune=cpu-type. My take on the rest was that if you are specifying both -march & -mtune you're probably getting too close to tweak overkill.

My suggestion would be to just use -m64 and you should be safe enough since you're running inside a x86-64 Linux, correct?

~~But if you don't need to run in another environment and you're feeling lucky and fault tolerant then -march=native might also work just fine for you.~~

-m32
  The 32-bit environment sets int, long and pointer to 32 bits  
  and generates code that runs on any i386 system.     
-m64
  The 64-bit environment sets int to 32 bits and long and pointer  
  to 64 bits and generates code for AMD's x86-64 architecture.

For what it's worth ...

Out of curiosity I tried using the technique described in the article you referenced. I tested gcc v4.6.3 in 64-bit Ubuntu 12.04 which was running as a VMware Player guest. The VMware VM was running in Windows 7 on a desktop using an Intel Pentium Dual-Core E6500 CPU.

The gcc option -m64 was replaced with just -march=x86-64 -mtune=generic.

However, compiling with -march=native resulted in gcc using all of the much more specific compiler options below.

-march=core2 -mtune=core2 -mcx16 
-mno-abm -mno-aes -mno-avx -mno-bmi -mno-fma -mno-fma4 -mno-lwp 
-mno-movbe -mno-pclmul -mno-popcnt -mno-sse4.1 -mno-sse4.2 
-mno-tbm -mno-xop -msahf --param l1-cache-line-size=64 
--param l1-cache-size=32 --param l2-cache-size=2048

So, yes, as the gcc documentation states when "Using -march=native ... the result might not run on different machines". To play it safe you should probably only use -m64 or it's apparent equivalent -march=x86-64 -mtune=generic for your compiles.

I can't see how you would have any problem with this since the intent of those compiler options are that gcc will produce code capable of running correctly on any x86-64/amd64 compliant CPU. (No?)

I am frankly astounded at how specific the gcc -march=native CPU options turned out to be. I have no idea how a CPU's L1 cache size being 32k could be used to fine tune the generated code. But apparently if there is a way to do this, then using -march=native will allow gcc to do it.

I wonder if this might result in any noticeable performance improvements?

What is further instersting is it appears that -march=native can result in code that cannot execute in the same VM. https://issues.asterisk.org/jira/browse/ASTERISK-20128 hit and hit me hard. — Jason Pyeron, Jan 21 '13 at 16:29

score 1 · Answer 2 · answered Apr 12 '12 at 23:25

One would like the think that the CPU architecture reported by the guest OS is what you should optimize for. Otherwise, I'd call it a bug. There can be decent reasons for bugs sometimes, but...

Note that not all hypervisors will necessarily be the same.

It might be a good idea to check on a mailing list for your specific hypervisor.

When compiling programs to run inside a VM, what should march and mtune be set to?

2 Answers2

For what it's worth ...

Linked