3

Here are two versions of a trivial arithmetic expression evaluator (playground link: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=d3da06b0077b29e0e3ac85720c567dd8) The second version uses a recursive call to reuse some code (obviously not worth the effort in this toy example, but that's why it's a toy example). I'm convinced that a sufficiently smart compiler could realize that the recursive call to eval_slow(Sum(...)) does not itself make any recursive calls, and therefore is safe to inline, and eval_slow should actually compile to the same assembly as eval_fast. In practice, rustc currently does not perform this optimization, and eval_slow contains a recursive call (see assembly output in playground link).

Are there optimizing compilers that are capable of performing this type of optimization (for any language)? Is there a name for this type of optimization in the compiler literature? Is this likely to be optimized correctly in the near future, or is (the general version of) this a very hard open problem?

pub enum Expr {
    Lit(isize),
    Sum(isize, isize),
    Sub(isize, isize),
}

// simple and fast
pub fn eval_fast(expr: Expr) -> isize {
    use Expr::*;
    match expr {
        Lit(x) => x,
        Sum(x, y) => x + y,
        Sub(x, y) => x - y
    }
}

// we'd like to inline the recursive call to `eval_slow(Sum(...))`, but it doesn't happen. 
pub fn eval_slow(expr: Expr) -> isize {
    use Expr::*;
    match expr {
        Lit(x) => x,
        Sum(x, y) => x + y,
        Sub(x, y) => eval_slow(Sum(x, -y))
    }
}

NOTE: also tagging this as C++ since my questions aren't rust-specific, though my example is (and afaik it's quite possible that this optimization would live in the language-agnostic passes of LLVM anyway)

EDIT: TCO is not what I'm looking for - the logic above does not rely on the recursive call being in tail position. Also, contrary to some initial comments, Clang does not solve this for C++ in the general case - here's an example which has been modified such that the recursive call is not in tail position, and in which the recursive call is not inlined. https://godbolt.org/z/cGabvvvro (yes, one version prints twice - shouldn't affect main point)

ajp
  • 1,723
  • 14
  • 22
  • 1
    I only know that for the visual studio c++ compiler, you might want to take a look at the [`__force_inline` keyword](https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170) – HelpfulHelper Jul 31 '23 at 18:14
  • https://godbolt.org/z/43K8hxf98 – n. m. could be an AI Jul 31 '23 at 18:27
  • @n.m.couldbeanAI Interesting that in [this case](https://godbolt.org/z/TfsqKbz76) it does not inline the call. So much about zero cost abstractions :-P – chrysante Jul 31 '23 at 18:30
  • 1
    @chrysante Use -O2. – n. m. could be an AI Jul 31 '23 at 18:44
  • 1
    Interestingly, the same code (conceptually) that [in C++ is optimized by clang](https://godbolt.org/z/5Y1GsvPx1), is [not optimized in Rust by rustc](https://rust.godbolt.org/z/bn6TsWT74). It's known that rustc generates degenerate LLVM IR. Anyway, I removed the C++ tag since it optimizes well in C++. – Chayim Friedman Jul 31 '23 at 19:41
  • However, beware that while this optimizes in C++ in this simple made-up example, it is very likely it won't be optimized in real cases, where the function is large. – Chayim Friedman Jul 31 '23 at 19:43
  • 2
    If you mark the function as `#[inline(always)]` though, [it does inline](https://rust.godbolt.org/z/4sPEWoqjP), so I guess this answers the question? – Chayim Friedman Jul 31 '23 at 19:45
  • 3
    @ChayimFriedman using `#[inline(always)]` doesn't work here. It merely posponds code generation for that function (all `inline` functions are generated at call-site) so you don't see code in godbolt but it would generate recursion at callsite. See this: https://godbolt.org/z/cveh447T7 – Angelicos Phosphoros Jul 31 '23 at 22:00
  • I managed to catch 2 miscompilations using this code! https://godbolt.org/z/c7ffrjYY9 – Angelicos Phosphoros Jul 31 '23 at 22:19
  • 1
    Opened an issue for Rust compiler: https://github.com/rust-lang/rust/issues/114312 – Angelicos Phosphoros Jul 31 '23 at 22:31
  • @ChayimFriedman I readded the C++ tag, along with a slightly more complex C++ example which does not optimize well - see edit at the end of main post – ajp Jul 31 '23 at 23:53
  • @ajp you're missing `break;`. https://godbolt.org/z/fTPx4oPWr – ecatmur Aug 01 '23 at 00:24
  • @ecatmur oops - see here for updated version that contains recursive call https://godbolt.org/z/cGabvvvro (i know behavior is now not identical because eval_slow prints twice - but should still be inlineable) – ajp Aug 01 '23 at 01:22

2 Answers2

2

Unfortunately, unlike Clang, Rust doesn't perform tail call elimination successfully in that case by default.

However, if we switch from recursion to loop, we could get conceptually same code without recursion:

pub fn eval_slow(mut expr: Expr) -> isize {
    use Expr::*;
    loop{
        expr = match expr {
            Lit(x) => return x,
            Sum(x, y) => return x + y,
            Sub(x, y) => Sum(x, -y),
        };
    }
}

Result:

example::eval_slow:
        mov     rax, qword ptr [rdi]
        test    rax, rax
        je      .LBB0_3
        cmp     rax, 2
        jne     .LBB0_2
        mov     rcx, qword ptr [rdi + 8]
        xor     eax, eax
        sub     rax, qword ptr [rdi + 16]
        mov     qword ptr [rdi], 1
        mov     qword ptr [rdi + 16], rax
        add     rax, rcx
        ret
.LBB0_3:
        mov     rax, qword ptr [rdi + 8]
        ret
.LBB0_2:
        mov     rcx, qword ptr [rdi + 8]
        mov     rax, qword ptr [rdi + 16]
        add     rax, rcx
        ret

godbolt link

  • thanks! partially useful, but I'm not satisfied, because "tail call" should not be required here - see Edit to main post – ajp Jul 31 '23 at 23:54
1

GCC calls this "recursive inlining":

  • max-inline-insns-recursive

  • max-inline-insns-recursive-auto

    • Specifies the maximum number of instructions an out-of-line copy of a self-recursive inline function can grow into by performing recursive inlining.

    • --param max-inline-insns-recursive applies to functions declared inline. For functions not declared inline, recursive inlining happens only when -finline-functions (included in -O3) is enabled; --param max-inline-insns-recursive-auto applies instead.

  • max-inline-recursive-depth

  • max-inline-recursive-depth-auto

    • Specifies the maximum recursion depth used for recursive inlining.

    --param max-inline-recursive-depth applies to functions declared inline. For functions not declared inline, recursive inlining happens only when -finline-functions (included in -O3) is enabled; --param max-inline-recursive-depth-auto applies instead.

(etcetera)

According to this presentation from 2015, LLVM does not consider self-recursive (other than tail recursive) functions for inlining. I haven't looked at the code to see how straightforward it would be to add; it would necessarily require some level of heuristic.

An amusing technique to force LLVM to inline the non-tail recursive call is to generate multiple copies of the function using the (C++) template machinery, then rely on the optimizer to inline them back in and discard the redundant copies:

template<int I = 0>
int eval_slow(expr e)
{
    int ret;
    switch (e.op)
    {
        case expr::lit: ret = e.i1; break;
        case expr::plus: ret = e.i1 + e.i2; break;
        case expr::minus: ret = eval_slow<(I + 1) % 2>(expr{expr::plus, e.i1, -e.i2}); break;
    }
    std::cout << ret;
    return ret;
}
int eval_slow(expr e) { return eval_slow<>(e); }
ecatmur
  • 152,476
  • 27
  • 293
  • 366