I've noticed an interesting phenomenon around flags to the compiler linker affecting the running code in ways I cannot understand.
I have a library that presents different implementations of the same algorithm in order to test the run speed of those different implementations.
Initially, I tested the situation with a pair of identical implementation to check the correct thing happened (both ran at roughly the same speed). I begun by compiling the objects (one per implementation) with the following compiler flags:
-g -funroll-loops -flto -Ofast -Werror
and then during linking passed gcc the following flags:
-Ofast -flto=4 -fuse-linker-plugin
This gave a library that ran blazingly fast, but curiously was reliably and repeatably ~7% faster for the first object that was included in the arguments during linking (so either implementation was faster if it was linked first).
so with:
gcc -o libfoo.so -O3 -ffast-math -flto=4 -fuse-linker-plugin -shared support_obj.os obj1.os obj2.os -lm
vs
gcc -o libfoo.so -O3 -ffast-math -flto=4 -fuse-linker-plugin -shared support_obj.os obj2.os obj1.os -lm
the first case had the implementation in obj1 running faster than the implementation in obj2. In the second case, the converse was true. To be clear, the code is identical in both cases except for the function entry name.
Now I removed this strange link-argument-order difference (and actually sped it up a bit) by removing the -Ofast
flag during linking.
I can replicate mostly the same situation by changing -Ofast
to -O3 -ffast-math
, but in that case I need to supply -ffast-math
during linking, which leads again to the strange ordering speed difference. I'm not sure why the speed-up is maintained for -Ofast
but not for -ffast-math
when -ffast-math
is not passed during linking, but I can accept it might be down to the link time optimisation passing the relevant info in one case but not the other. This doesn't explain the speed disparity though.
Removing -ffast-math
means it runs ~8 times slower.
Is anybody able to shed some light on what might be happening to cause this effect? I'm really keen to know what might be going on to cause this funny behaviour so I can not accidentally trigger it down the line.
The run speed test is performed in python using a wrapper around the library and timeit, and I'm fairly sure this is doing the right thing (I can twiddle orders and things to show the python side effects are negligible).
I also tested the library for correctness of output, so I can be reasonably confident of that too.