OK, so a compiler is free to reorder code fragments for performance reasons. Let's suppose some code snippet, translated directly into machine code with no optimizations applied, looks like this:
machine_instruction_1
machine_instruction_2
machine_instruction_3
machine_instruction_4
machine_instruction_5
but a smart compiler decides the original order is highly inefficient and reorders the same code so that the new order of resulting machine instructions is as follows:
machine_instruction_5
machine_instruction_4
machine_instruction_3
machine_instruction_2
machine_instruction_1
So far so good.
Here's where the tricky part starts. The resulting machine instructions will be executed by a cpu which is free to reshuffle them once again any way it finds appropriate for performance reasons, as long as the code logic is preserved. Since we're dealing with two "layers" of instruction reordering:
- the first one, due to compiler optimizations
- the second one, due to cpu out-of-order execution
what makes the compile-time instruction reordering relevant at all? All the cpu sees is a sequence of raw machine instructions, with no indication of any prior optimizations performed by the compiler. If cpu introduces its own "layer" of reordering, how come it doesn't invalidate the order of instructions set by the compiler? Basically, what forces cpu to respect compiler optimizations? How do compile-time reordering and run-time reordering "cooperate", how does the latter complement the former?