Can programs use (significantly) less memory when compiled for different processors?

Question

I have a C++ program I'm compiling for AMD64. Of course, different processors, despite being AMD64, support different features and instructions because they implement different microarchitectures. An easy way to optimise the program for one's own machine is to just use -march=native in Clang or GCC, but this isn't very portable for distribution's sake. A more portable solution would be to pick and choose specific target features.

This obviously affects performance (some processors support AVX-512, some don't, some support AVX2, some don't, etc.), but can this affect memory usage (heap/stack, not code size) in any significant way?

Of course yes: the binary size will depend on the instruction set used and the binary lives in memory during execution. Is it significant? It depends on your specific conditions. Eons ago, the binary size mattered enough for CISC to be developed and preferend over RISC. — YSC, Jul 09 '21 at 11:02
`but can this affect memory usage in any significant way` - I don't think so, possibly subject to your specific values for _significant_. — 500 - Internal Server Error, Jul 09 '21 at 11:06
Ah I meant heap/stack memory usage, not code/binary size. I'll clarify in the original post. — mbarbar, Jul 09 '21 at 11:29
@500-InternalServerError I guess I'm thinking along the lines of something that isn't a margin of error. For argument's sake, let's say 5%-10%. My program regularly runs up to 100 GB (and more) on the heap, and 5 GB here would not be insignificant. — mbarbar, Jul 09 '21 at 11:30
As I see it, the question is: what's the data that takes so much space. Is your code filling your memory with `float32`'s? If the datatype is not touched, I would not expect a decrease in memory in that case. — André, Jul 09 '21 at 11:36

Peter Cordes · Accepted Answer · 2021-07-09T20:09:29.117

Different alignment rules or type widths are the two main ways you could get a difference, but -march= doesn't change that, not when compiling for the same ABI on the same ISA. (Otherwise -march=skylake-avx512 code couldn't call -march=sandybridge code and vice versa, if they disagreed on struct layouts.)

Compiling for a different ABI can save space especially in pointer-heavy data structures. Specifically an ILP32 ABI such as Linux x32 has 4 byte pointers instead of 8, so struct foo { foo *next; int val; }; is 8 bytes instead of 16 (after padding to make sizeof(foo) a multiple of the alignof(foo) it inherits from pointers needing 8-byte alignment). But that won't work for your use-case of 100GB of data; 32-bit pointers limit you to 4GiB of address space.

-march= could have some small effect on stack space when auto-vectorizing. e.g. a function might align the stack by 64 in order to spill/reload a ZMM vector. Or with older GCC, align even if the final asm doesn't actually store or load any vectors to the stack frame. But that's at most an extra 56 bytes of wasted stack space per level of function nesting, vs. 16-byte alignment which can be had for free as part of the calling convention.

GCC / clang's optimizers won't AFAIK do any optimizations that change the size of dynamic allocations. Clang can sometimes optimize away a dynamic allocation entirely in a function that for example creates and destroys a std::vector<float> foo(100); and all accesses to it can be optimized away. (e.g. store constants into the vector and then read them back, it can just optimize that away then eliminate the allocation, too. Or a std::vector that isn't even used.)

Possibly a different allocator library that's better at reducing internal fragmentation could save space, if you end up with some memory pages allocated but not fully used. But that's not something -march= affects.

Can programs use (significantly) less memory when compiled for different processors?

1 Answers1