What's the relation between instruction reordering done by a compiler and instruction reordering done by a cpu?

Question

OK, so a compiler is free to reorder code fragments for performance reasons. Let's suppose some code snippet, translated directly into machine code with no optimizations applied, looks like this:

machine_instruction_1
machine_instruction_2
machine_instruction_3
machine_instruction_4
machine_instruction_5

but a smart compiler decides the original order is highly inefficient and reorders the same code so that the new order of resulting machine instructions is as follows:

machine_instruction_5
machine_instruction_4
machine_instruction_3
machine_instruction_2
machine_instruction_1

So far so good.

Here's where the tricky part starts. The resulting machine instructions will be executed by a cpu which is free to reshuffle them once again any way it finds appropriate for performance reasons, as long as the code logic is preserved. Since we're dealing with two "layers" of instruction reordering:

the first one, due to compiler optimizations
the second one, due to cpu out-of-order execution

what makes the compile-time instruction reordering relevant at all? All the cpu sees is a sequence of raw machine instructions, with no indication of any prior optimizations performed by the compiler. If cpu introduces its own "layer" of reordering, how come it doesn't invalidate the order of instructions set by the compiler? Basically, what forces cpu to respect compiler optimizations? How do compile-time reordering and run-time reordering "cooperate", how does the latter complement the former?

The CPU doesn't reorder the instructions, it execute them out-of-order. It means it can "start" a later instruction before an earlier one "finishes" (the precise meaning of the terms is a bit technical). However the order in which the CPU encounter the instructions is still relevant because earlier instructions are first to occupy the resources (e.g. is mi1 requires resource A for 5 cycles and mi2 requires A for 1 cycle and B for 4 cycles, scheduling mi1+mi2 would take 10cycles while mi2+mi1 would require 6 cycles). — Margaret Bloom, Jan 18 '19 at 09:12
@MargaretBloom But isn't it effectively the same thing? Suppose the original order is "instruction1, instruction2" and after compiler optimization the new order becomes "instruction2, instruction1". Now, even though a cpu encounters instruction2 before instruction1, it could still execute them out of order, e.g. execute instruction1 before instruction2, thus effectively ignore compiler's optimization. Perhaps I fail to grasp a difference between how a compiler shuffles instructions and how a cpu executes them out of order. A small example would definitely help. — Peter, Feb 13 '19 at 19:37

Alain Merigot · Answer 1 · 2019-01-18T08:17:03.110

When considering instruction execution what must be respected is the program semantic. Any ordering is correct, as long as this is respected. Concretely, this is described by "dependencies" that indicate if some instructions require a given order with respect to the program correct behavior. For instance consider the following program

1 x <= y+3
2 z <= 2*x
3 w = 5*y
4 y = 2*a

Instructions 1 and 2 are dependent. If their relative ordering is modified, the program is not compliant with what wanted the programmer and any reordering is forbidden. For different reasons, 4 cannot be executed unchanged before y has been used by 1 and 3. There are different kind of dependancies, including when considering control flow.

Compiler and hardware try to reorder programs in order to improve their efficiency, while respecting dependencies. Indeed their actions is complementary.

Compiler can consider larger reorganisations than processor and uses more complex heuristics to do it. A compiler can have a large view on a program and reorder large part of code. In theory, an instruction at a distance of, say, 1000 can be displaced, if there is no dependency violation and the compiler considers that it can improve program execution. It can completely reorganize code, unroll loops, etc. On the contrary, the processor has a relatively limited window of prefetched instructions that can be considered for reordering and any rearrangement can only concerns close instructions and is based on simple methods doable within a cycle.

But the processor has a great advantage. It can do dynamic reordering and respond to random events. At a given time, it considers what instruction can be executed with respect to dependencies and data availability and reorders the code accordingly. This is based on dynamic dependencies, and, for instance, if the result of a previous dependant instruction is not available because of a cache miss, it will execute other instructions that respect dependencies. Successive runs of the same program with the same input data can lead to different ordering depending on branch mispredictions, cache misses, etc.

So there is no competition between compiler and processor, but an efficient collaboration.

Also the CPU can optimistically do computations in advance and then trash them if they become invalid due to a "surprise" dependency. But the compiler that would try that could hugely pessimize the code as it doesn't know when computation units become available on the exact model used to run the code (unless compiled with a precise abstract model of that CPU, if such model is even available and not trade secret). — curiousguy, Jan 20 '20 at 22:30

What's the relation between instruction reordering done by a compiler and instruction reordering done by a cpu?

1 Answers1