I'm benchmarking the overhead of GCC Profile-Guided Optimization on the SPEC benchmarks. I have some weird results with some benchmarks. Indeed, two of my benchmarks are running faster when instrumented.
The normal executable is compiled with: -g -O2 -march=native
The instrumented executable is compiled with: -g -O2 -march=native -fprofile-generate -fno-vpt
I'm using GCC 4.7 (The Google branch to be precise). The computer on which the benchmark is running has an Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz.
bwaves is a Fortran benchmark and libquantum
Here are the results:
bwaves-normal: 712.14
bwaves-instrumented: 697.22
=> ~2% faster
libquantum-normal: 463.88
libquantum-instrumented: 449.05
=> ~3.2% faster
I ran the benchmarks several times thinking that it could be a problem on ma machine but each time I confirmed them.
I would understand a very small overhead on some programs, but I don't see any reason for an improvement.
So my question is: How can the GCC instrumented executable be faster than the optimized normal one ?
Thanks