Write this on your blackboard: Nothing in a compiler is simple. If you have no blackboard, write it on your whiteboard, forearm or the door of a convenient bathroom stall.
SSA is extremely convenient for algorithms that reason about code, which includes practically all optimisations and all analysis. I would say that SSA is as close to "always" as anything in a compiler ever is.
But of course some optimisations exist that run at a very late stage during compilation, because even though SSA is generally extremely convenient for reasoning, that doesn't make it necessarily the most convenient form for every instance of reasoning. It's close, but…
Suppose a fictional LLVM backend produces the three assembly instructions 'add r1, r2, r3', 'mv r3, r4' and 'add r1, r5, r3' where the destination register is the last one. You may then observe that if the first instruction were changed to 'add r1, r2, r4' then the second one could be removed. This is called peephole optimisation, and some LLVM backends do contain peephole optimisers that work after register allocation. (I'm fairly sure I've seen either the ARM or x86 backend perform peephole optimisation twice, both before and after register allocation. Compilers are never simple.)
So, even though it's not 100% true, you can very nearly say that phi nodes remain in the code until the final native machine code is generated. Beacuse: If someone wants to add any clever analysis, transformation or optimisation, it's nearly guaranteed that they insert the new code before phi nodes are removed and registers are allocated.