What is the difference between clang -O1
and opt -O1
?
I observed that these two commands behave in a significantly different way.
Context
I would like to test LLVM optimization passes. More specifically, I would like to pick a subset of -O1
passes so that 1) the subset's performance is as good as the whole -O1
, and 2) the selected passes are easy to reason about their correctness.
To test the performance of subset, I wrote a shell script like:
clang -o a.bc -emit-llvm -c a.c
opt (..., optmizations like -adce, ...) a.bc >a.opt.bc
clang -o a a.opt.bc
After a lot of tries, I figured out that:
clang -o a.bc -emit-llvm -c a.c
opt -O1 a.bc >a.opt.bc
clang -o a a.opt.bc
and clang -O1 -o a a.c
emit significantly different binary. The latter is much more efficient, for e.g., for an example program the former takes 49 secs to run, while the latter takes 29 secs.
Tried Approaches
I searched what does it mean for
clang -O1
, and found some references like Clang optimization levels, but the article is really aboutopt
, notclang
.I tried to find official documentation for
clang
, but it was unfruitful.I tried to understand
clang
source code, but I could not...
Found Facts
I tried with
clang -o a.bc -emit-llvm -c a.c opt -mem2reg -O1 a.bc >a.opt.bc clang -o a a.opt.bc
since a reference(Clang optimization levels) said that opt -O1
does not contain mem2reg
pass. It helped to close some gap, but not completely. (49 secs -> 40 secs) That means, I guess, clang -O1
performs some preliminary optimizations, such as mem2reg
, before -O1
does something else.
I tried with
clang -o a.bc -emit-llvm -c a.c opt -mem2reg -O1 a.bc >a.opt.bc clang -O1 -o a a.opt.bc
since I expect some target-dependent optimizations after LLVM IR passes. Actually it worked. (40 secs -> 26 secs, even faster than just clang -O1
's 29 secs)
Conclusion
In conclusion, I guess there is a pre- and post- LLVM IR passes in clang -O1
which is not present in opt -O1
. So is there anyone who knows the difference between clang -O1
and opt -O1
? Any reference to official docs or source code, or ways to solve my initial problem will be much appreciated.