1

I wrote this function for an Advent of Code problem:

pub fn best_window_variable(grid: &Vec<Vec<i32>>) -> (usize, usize, usize) {
    let mut best_power = i32::MIN;
    let mut best_window = (0usize, 0usize, 0usize);

    for y in 0..grid.len() {
        for x in 0..grid[y].len() {
            let space = std::cmp::min(grid.len() - y, grid[y].len() - x);
            let mut window_power = 0i32;
            for z in 1..=space {
                for dy in 0..z {
                    window_power += grid[y + dy][x + z - 1];
                }

                for dx in 0..(z - 1) {
                    window_power += grid[y + z - 1][x + dx];
                }

                if window_power > best_power {
                    best_power = window_power;
                    best_window = (x + 1, y + 1, z);
                }
            }
        }
    }

    best_window
}

The optimised (release) build is 50x faster than the unoptimised (debug) build. I would like to understand what optimisations are providing this improvement.

I am learning x86-64 assembly and trying to read the assembly output using cargo-asm for both releases to understand the difference.

Is there an easier way to get to the answer?

Does LLVM log which optimisations are applied? If so, would it be possible to gauge which had the most impact on runtime?

Would it be easier to use a tool like Ghidra to analyse the compiled code?

Endzeit
  • 4,810
  • 5
  • 29
  • 52
road_rash
  • 121
  • 1
  • 7
  • 2
    See also [What optimization is making my rust program so much faster?](https://stackoverflow.com/q/54320166/155423); [What LLVM passes are performed on emitted LLVM IR?](https://stackoverflow.com/q/50068793/155423) – Shepmaster Sep 28 '21 at 13:58
  • 1
    I actually think this is a duplicate of the latter — the `-Z print-llvm-passes` flag, specifically. – Shepmaster Sep 28 '21 at 14:00
  • mostly, the only piece that's missing for me would be how to know which optimization is really making a big difference. What would be the best way to find this out? Should I test it with different optimisations options switched on/off and measure the runtime differences? – road_rash Sep 28 '21 at 14:19
  • 2
    Just a note- a large speedup comes from dropping the debug info. For example, in a debug build every single arithmetic operation performs a bound check, while in release mode they simply wrap. – Aiden4 Sep 28 '21 at 14:26
  • 3
    Most likely, it's a combination of many optimization passes that makes the difference, not any individual one. The passes are designed to work in concert. I'm all for learning with hands-on examples, but my gut feeling in this particular case is that you'd learn way more by first reading a general introduction into optimizing compilers. Be prepared to invest a lot of time, though – it's a complex topic. – Sven Marnach Sep 28 '21 at 14:26
  • 3
    It's also important to realize that Rust uses abstractions that _rely_ on optimizing compilers. Even something as simple as `for foo in 0..100 { ... }` "instantiates" types such as `Range` and `RangeIteratorImpl`, and calls standard library functions like `std::mem::replace()`, `i32::forward_unchecked()`, or `copy_nonoverlapping()`. All of these things are optimized into virtually nothing, but seriously overwhelm unoptimized output, making it much slower than the unoptimized C equivalent of `for (int i = 0; i < 100; i++) { ...}`. – user4815162342 Sep 28 '21 at 16:05

0 Answers0