Optimization switches - what do they really do?

Question

Probably everyone uses some kind of optimization switches (in case of gcc, the most common one is -O2 I believe).

But what does gcc (and other compilers like VS, Clang) really do in presence of such options?

Of course there is no definite answer, since it depends very much on platform, compiler version, etc. However, if possible, I would like to collect a set of "rules of thumb". When should I think about some tricks to speed-up the code and when should I just leave the job to the compiler?

For example, how far will compiler go in such (a little bit artifficial...) cases, for different optimization levels:

1) sin(3.141592) // will it be evaluated at compile time or should I think of a look-up table to speed-up the calculations?

2) int a = 0; a = exp(18), cos(1.57), 2; // will the compiler evaluate exp and cos, although not needed, as the value of the expression is equal 2?

3)

for (size_t i = 0; i < 10; ++i) {
  int a = 10 + i;
}

// will the compiler skip the whole loop as it has no visible side-effects?

Maybe you can think of other examples.

I don't think is a question suitable for SO - it's not a question. — Lubo Antonov, Sep 07 '12 at 11:07
@LuboAntonov: I think it's a perfect question for SO. He wants objective answers to three questions about compiler optimization. Other than observing the generated assembly, there isn't a clear way to know if you're new to it all. — Josh, Sep 07 '12 at 11:09
The best advice is to compile the code to assembly language using the -S option (for gcc) and see what code it generates. Of course you need to learn the basics of assembly language but it's not that hard to understand. — jcoder, Sep 07 '12 at 11:10
Have you read the documentation on gcc? It explains quite detailed what -ON enables, and those options have sometimes quite detailed explanation already in the manpage. Also optimizations improve with compiler versions and/or with the way you compiled your compiler, so there no human can ever predict what compilers will do in general, you have to look at the generated assembler. gcc.godbolt.org is a nice tool to do so. — PlasmaHH, Sep 07 '12 at 11:13
I think you should almost always trust your compiler. It will do *a lot* of improvements to the code. Not just little tricks. See an example in [this answer to another question](http://stackoverflow.com/a/11639305/597607) on low level optimizations. — Bo Persson, Sep 07 '12 at 11:13
@Josh It's an open-ended question, even the OP admits that the answer would differ for each compiler, calls for opinions, he wants to essentially start a wiki discussion. I think that hits about all the reasons why questions get closed here. — Lubo Antonov, Sep 07 '12 at 11:16
Starting on page 73, this document: http://www.agner.org/optimize/optimizing_cpp.pdf lists optimizations available in various compilers. The whole thing is a bit heavy going though. This (rather old) article: http://www.linuxjournal.com/article/7269 has a list of optimizations at each level for gcc. This manual page shows something similar: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html — BoBTFish, Sep 07 '12 at 11:23
@LuboAntonov: There are compiler fundamentals though, so it is up to us to speak about what is generally possible and to teach the OP how to verify in her case. — Matthieu M., Sep 07 '12 at 11:32
@MatthieuM. - the question is overly broad: a full answer could fill a book, or several. — Joris Timmermans, Sep 07 '12 at 11:38

score 6 · Accepted Answer · answered Sep 07 '12 at 11:27

If you want to know what a compiler does, your best bet is to have a look at the compiler documentation. For optimizations, you may look at the LLVM's Analysis and Transform Passes for example.

1) sin(3.141592) // will it be evaluated at compile time ?

Probably. There are very precise semantics for IEEE float computations. This might be surprising if you change the processor flags at runtime, by the way.

2) int a = 0; a = exp(18), cos(1.57), 2;

It depends:

whether the functions exp and cos are inline or not
if they are not, whether they correctly annotated (so the compiler know they have no side-effect)

For functions taken from your C or C++ Standard library, they should be correctly recognized/annotated.

As for the eliminitation of the computation:

-adce: Aggressive Dead Code Elimination
-dce: Dead Code Elimination
-die: Dead Instruction Elimination
-dse: Dead Store Elimination

compilers love finding code that is useless :)

3)

Similar to 2) actually. The result of the store is not used and the expression as no side-effect.

-loop-deletion: Delete dead loops

And for the final: what not put the compiler to the test ?

#include <math.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
  double d = sin(3.141592);
  printf("%f", d);

  int a = 0; a = (exp(18), cos(1.57), 2); /* need parentheses here */
  printf("%d", a);

  for (size_t i = 0; i < 10; ++i) {
    int a = 10 + i;
  }

  return 0;
}

Clang tries to be helpful already during the compilation:

12814_0.c:8:28: warning: expression result unused [-Wunused-value]
  int a = 0; a = (exp(18), cos(1.57), 2);
                           ^~~ ~~~~
12814_0.c:12:9: warning: unused variable 'a' [-Wunused-variable]
    int a = 10 + i;
        ^

And the emitted code (LLVM IR):

@.str = private unnamed_addr constant [3 x i8] c"%f\00", align 1
@.str1 = private unnamed_addr constant [3 x i8] c"%d\00", align 1

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable {
  %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([3 x i8]* @.str, i64 0, i64 0), double 0x3EA5EE4B2791A46F) nounwind
  %2 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([3 x i8]* @.str1, i64 0, i64 0), i32 2) nounwind
  ret i32 0
}

We remark that:

as predicted the sin computation has been resolved at compile-time
as predicted the exp and cos have been stripped completely.
as predicted the loop has been stripped too.

If you want to delve deeper into compiler optimizations I would encourage you to:

learn to read IR (it's incredibly easy, really, much more so that assembly)
use the LLVM Try Out page to test your assumptions

score 1 · Answer 2 · answered Sep 07 '12 at 12:13

The compiler has a number of optimization passes. Every optimization pass is responsible for a number of small optimizations. For example, you may have a pass that calculates arithmetic expressions at compile time (so that you can express 5MB as 5 * (1024*1024) without a penalty, for example). Another pass inlines functions. Another searches for unreachable code and kills it. And so on.

The developers of the compiler then decide which of these passes they want to execute in which order. For example, suppose you have this code:

int foo(int a, int b) {
  return a + b;
}

void bar() {
  if (foo(1, 2) > 5)
    std::cout << "foo is large\n";
}

If you run dead-code elimination on this, nothing happens. Similarly, if you run expression reduction, nothing happens. But the inliner might decide that foo is small enough to be inlined, so it substitutes the call in bar with the function body, replacing arguments:

void bar() {
  if (1 + 2 > 5)
    std::cout << "foo is large\n";
}

If you run expression reduction now, it will first decide that 1 + 2 is 3, and then decide that 3 > 5 is false. So you get:

void bar() {
  if (false)
    std::cout << "foo is large\n";
}

And now the dead-code elimination will see an if(false) and kill it, so the result is:

void bar() {
}

But now bar is suddenly very tiny, when it was larger and more complicated before. So if you run the inliner again, it would be able to inline bar into its callers. That may expose yet more optimization opportunities, and so on.

For compiler developers, this is a trade-off between compile time and generated code quality. They decide on a sequence of optimizers to run, based on heuristics, testing, and experience. But since one size does not fit all, they expose some knobs to tweak this. The primary knob for gcc and clang is the -O option family. -O1 runs a short list of optimizers; -O3 runs a much longer list containing more expensive optimizers, and repeats passes more often.

Aside from deciding which optimizers run, the options may also tweak internal heuristics used by the various passes. The inliner, for example, usually has lots of parameters that decide when it's worth inlining a function. Pass -O3, and those parameters will lean more towards inlining functions whenever there is a chance of improved performance; pass -Os, and the parameters will cause only really tiny functions (or functions provably called exactly once) to be inlined, as anything else would increase executable size.

score 0 · Answer 3 · answered Sep 07 '12 at 11:12

The compilers does all sort of optimization that you event cannot think off. Especially the C++ compilers.

They do things like unrolling loops, make functions inline, eliminating dead code, replacing multiple instruction with just one and so on.

A piece of advice I can give is: In the C/C++ compilers you can have faith that they will perform a lot of optimizations.

Take a look at [1].

[1] http://en.wikipedia.org/wiki/Compiler_optimization

Optimization switches - what do they really do?

3 Answers3