Do PHI nodes remain in LLVM IR until compilation to binary?

Question

I am new to LLVM and Intermediate Representation (IR), and I am trying to understand how PHI nodes are handled in LLVM IR. I understand that PHI nodes are a fundamental component of SSA (Static Single Assignment) form in LLVM IR, and are used to represent control flow in a program.

However, I am not sure if PHI nodes remain in LLVM IR until compilation to binary. Are all optimizations in LLVM's optimization pipeline designed to work with PHI nodes and SSA form, or are there cases where PHI nodes need to be eliminated or modified before optimization can take place?

I would appreciate any insights or clarifications on this topic. Thank you!

score 3 · Accepted Answer · answered Mar 23 '23 at 18:34

3

LLVM compilation pipeline consists of dozens of separate transformations (called passes) which can roughly be split to several main phases:

LLVM IR (SSA)
Machine IR with SSA
Machine IR without SSA
Machine IR with final physical registers (no SSA)
MCInst (no SSA, roughly equivalent to final binary code)

As you can see, the last three phases do not use SSA (they use copies instead of PHI instructions).

answered Mar 23 '23 at 18:34

yugr

19,769
3
51
96

Hello @yugr, thank you for your response! I was wondering if you could clarify what you meant by "copies instead of PHI"? Does it refer to binary replacement for PHI nodes? I'm not very familiar with this topic, so any additional explanation would be greatly appreciated. Thank you – Volodya Lombrozo Mar 24 '23 at 06:54
@VolodyaLombrozo Basically whenever the `y = PHI(x1, x2)` SSA instruction could be replaced with `y= COPY x1` and `y = COPY x2` in predecessor basic blocks. For gory details you could check the corresponding [PHI elimination pass](https://llvm.org/doxygen/PHIElimination_8cpp_source.html) for Machine IR. – yugr Mar 24 '23 at 20:20

score 2 · Answer 2 · answered Mar 23 '23 at 17:12

Write this on your blackboard: Nothing in a compiler is simple. If you have no blackboard, write it on your whiteboard, forearm or the door of a convenient bathroom stall.

SSA is extremely convenient for algorithms that reason about code, which includes practically all optimisations and all analysis. I would say that SSA is as close to "always" as anything in a compiler ever is.

But of course some optimisations exist that run at a very late stage during compilation, because even though SSA is generally extremely convenient for reasoning, that doesn't make it necessarily the most convenient form for every instance of reasoning. It's close, but…

Suppose a fictional LLVM backend produces the three assembly instructions 'add r1, r2, r3', 'mv r3, r4' and 'add r1, r5, r3' where the destination register is the last one. You may then observe that if the first instruction were changed to 'add r1, r2, r4' then the second one could be removed. This is called peephole optimisation, and some LLVM backends do contain peephole optimisers that work after register allocation. (I'm fairly sure I've seen either the ARM or x86 backend perform peephole optimisation twice, both before and after register allocation. Compilers are never simple.)

So, even though it's not 100% true, you can very nearly say that phi nodes remain in the code until the final native machine code is generated. Beacuse: If someone wants to add any clever analysis, transformation or optimisation, it's nearly guaranteed that they insert the new code before phi nodes are removed and registers are allocated.

Thank you for the comprehensive answer! – Volodya Lombrozo Mar 24 '23 at 07:06 — Volodya Lombrozo, Mar 24 '23 at 07:06

Do PHI nodes remain in LLVM IR until compilation to binary?

2 Answers2