The LLVM MISched instruction scheduler uses declarative TableGen descriptions of the processor functional units, pipelines and latencies. Imagine trying to determine the equivalent of the coding guidelines from Intel's Optimization Reference Manual from those declarations.
Broadly speaking, what are the objectives/techniques for statically scheduling OOO processors? When would it schedule an instruction A before B and when would it schedule A after B for an OOO processor?
Superscalar processors can execute more than one instruction at a time. An in-order processor will only consider instructions in their original order. An out-of-order (OOO) processor can execute instructions out of order and then commit the results in order. Speculation doesn't matter for this question but I assume these processors are pipelined. Think A53 (in-order) and Haswell (OOO).
Which instruction an OOO processor will execute next is a scheduling decision made by the processor at run time. So this is usually called dynamic scheduling. Which instruction an in-order processor executes was decided by the compiler back when the program was compiled. Consequently this is usually called static scheduling.
However, compilers also statically target/schedule OOO processors. In both in-order and OOO cases, a compiler can look at a large window of instructions; a compiler has to deal with register pressure; and in both cases, a compiler wants to keep functional units busy. OOO processors typically can also rename registers, reducing register pressure.
Given that an OOO processor schedules instructions dynamically, what should the ahead of time compiler do to help this?