7

When doing integer arithmetic with checks for overflows, calculations often need to compose several arithmetic operations. A straightforward way of chaining checked arithmetic in Rust uses checked_* methods and Option chaining:

fn calculate_size(elem_size: usize,
                  length: usize,
                  offset: usize)
                  -> Option<usize> {
    elem_size.checked_mul(length)
             .and_then(|acc| acc.checked_add(offset))
}

However, this tells the compiler to generate a branch per each elementary operation. I have encountered a more unrolled approach using overflowing_* methods:

fn calculate_size(elem_size: usize,
                  length: usize,
                  offset: usize)
                  -> Option<usize> {
    let (acc, oflo1) = elem_size.overflowing_mul(length);
    let (acc, oflo2) = acc.overflowing_add(offset);
    if oflo1 | oflo2 {
        None
    } else {
        Some(acc)
    }
}

Continuing computation regardless of overflows and aggregating the overflow flags with bitwise OR ensures that at most one branching is performed in the entire evaluation (provided that the implementations of overflowing_* generate branchless code). This optimization-friendly approach is more cumbersome and requires some caution in dealing with intermediate values.

Does anyone have experience with how the Rust compiler optimizes either of the patterns above on various CPU architectures, to tell whether the explicit unrolling is worthwhile, especially for more complex expressions?

mzabaluev
  • 532
  • 5
  • 13
  • 2
    It's not very clear what your question is. What would an ideal answer contain? – Shepmaster Mar 20 '16 at 14:42
  • I'd like people to come up with anecdotes, or better, observations of consistent compiler behavior, when the unrolled approach ends up insignificantly better than the more ergonomic code with `Option`. – mzabaluev Mar 21 '16 at 17:55

1 Answers1

7

Does anyone have experience with how the Rust compiler optimizes either of the patterns above on various CPU architectures, to tell whether the explicit unrolling is worthwhile, especially for more complex expressions?

You can use the playground to check how LLVM optimizes things: just click on "LLVM IR" or "ASM" instead of "Run". Stick a #[inline(never)] on the function you wish to check, and pay attention to pass it run-time arguments, to avoid constant folding. As in here:

use std::env;

#[inline(never)]
fn calculate_size(elem_size: usize,
                  length: usize,
                  offset: usize)
                  -> Option<usize> {
    let (acc, oflo1) = elem_size.overflowing_mul(length);
    let (acc, oflo2) = acc.overflowing_add(offset);
    if oflo1 | oflo2 {
        None
    } else {
        Some(acc)
    }
}

fn main() {
    let vec: Vec<usize> = env::args().map(|s| s.parse().unwrap()).collect();
    let result = calculate_size(vec[0], vec[1], vec[2]);
    println!("{:?}",result);
}

The answer you'll get, however, is that the overflow intrinsics in Rust and LLVM have been coded for convenience and not performance, unfortunately. This means that while the explicit unrolling optimizes well, counting on LLVM to optimize the checked code is not realistic for now.

Normally this is not an issue; but for a performance hotspot, you may want to unroll manually.

Note: this lack of performance is also the reason that overflow checking is disabled by default in Release mode.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 1
    Thanks for the playground tips! On x86-64, the optimizer produces very similar code from both examples, but only because it turns `oflo1 | oflo2 | ...` into a series of tests and branches. If that's not really a problem after all, the idiomatic code may not be prime subject for hand-optimization. The branches will tend to be well-predicted in typical calculations (where overflows are exceptional or result from misuse), but there may be many of them to crank up eject pressure on the predictor, especially if the code is generic. – mzabaluev Mar 21 '16 at 19:22