Why the optimization, division-by-constant is not implemented in LLVM IR?

Question

According the source code *1 give below and my experiment, LLVM implements a transform that changes the division to multiplication and shift right.

In my experiment, this optimization is applied at the backend (because I saw that change on X86 assembly code instead of LLVM IR).

I know this change may be associated with the hardware. In my point, in some hardware, the multiplication and shift right maybe more expensive than a single division operator. So this optimization is implemented in backend.

But when I search the DAGCombiner.cpp, I saw a function named isIntDivCheap(). And in the definition of this function, there are some comments point that the decision of cheap or expensive depends on optimizing base on code size or the speed.

That is, if I always optimize the code base on the speed, the division will convert to multiplication and shift right. On the contrary, the division will not convert.

In the other hand, a single division is always slower than multiplication and shift right or the function will do more thing to decide the cost.

So, why this optimization is NOT implemented in LLVM IR if a single division always slower?

*1: https://llvm.org/doxygen/DivisionByConstantInfo_8cpp.html

Can you name the architecture when division is cheaper than multiply+shift? — Anton Korobeynikov, Apr 11 '22 at 07:28

score 1 · Accepted Answer · answered Apr 17 '22 at 12:13

Interesting question. According to my experience of working on LLVM front ends for High-level Synthesis (HLS) compilers, the answer to your questions lies in understanding the LLVM IR and the limitations/scope of the optimizations at LLVM IR stage.

The LLVM Intermediate Representation (IR) is the backbone that connects frontends and backends, allowing LLVM to parse multiple source languages and generate code to multiple targets. Hence, at the LLVM IR stage, it's often about intent rather than full-fledge performance optimizations.

Divide-by-constant optimization is very much performance driven. Not saying at all that optimizations at IR level have less or nothing to do with performance, however, there are inherent limitations in terms of optimizations at IR stage and divide-by-constant is one of those limitations.

To be more precise, the IR is not entrenched enough in low-level machine details and instructions. If you observe that the optimizations at LLVM IR are usually composed of analysis and transform passes. And as per my knowledge, you don't see divide-by-constant pass at the IR stage.

Why the optimization, division-by-constant is not implemented in LLVM IR?

1 Answers1