2

The code is naive:

use std::time;

fn main() {
    const NUM_LOOP: u64 = std::u64::MAX;
    let mut sum = 0u64;
    let now = time::Instant::now();
    for i in 0..NUM_LOOP {
        sum += i;
    }
    let d = now.elapsed();
    println!("{}", sum);
    println!("loop: {}.{:09}s", d.as_secs(), d.subsec_nanos());
}

The output is:

$ ./test.rs.out
9223372036854775809
loop: 0.000000060s
$ ./test.rs.out
9223372036854775809
loop: 0.000000052s
$ ./test.rs.out
9223372036854775809
loop: 0.000000045s
$ ./test.rs.out
9223372036854775809
loop: 0.000000041s
$ ./test.rs.out
9223372036854775809
loop: 0.000000046s
$ ./test.rs.out
9223372036854775809
loop: 0.000000047s
$ ./test.rs.out
9223372036854775809
loop: 0.000000045s

The program almost ends immediately. I also wrote an equivalent code in C using for loop, but it ran for a long time. I'm wondering what makes the Rust code so fast.

The C code:

#include <stdint.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>

double time_elapse(struct timespec start) {
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);
    return now.tv_sec - start.tv_sec +
           (now.tv_nsec - start.tv_nsec) / 1000000000.;
}

int main() {
    const uint64_t NUM_LOOP = 18446744073709551615u;
    uint64_t sum = 0;
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);

    for (int i = 0; i < NUM_LOOP; ++i) {
        sum += i;
    }

    double t = time_elapse(now);
    printf("value of sum is: %llu\n", sum);
    printf("time elapse is: %lf sec\n", t);

    return 0;
}

The Rust code is compiled using -O and the C code is compiled using -O3. The C code is running so slow that it hasn't stopped yet.

After fixing the bug found by visibleman and Sandeep, both programs were printing the same number in almost no time. I tried to reduce NUM_LOOP by one, results seemed reasonable considering an overflow. Moreover, with NUM_LOOP = 1000000000, both programs will not have overflow and produce correct answers in no time. What optimizations are used here? I know we can use simple equations like (0 + NUM_LOOP - 1) * NUM_LOOP / 2 to compute the result, but I don't think such computations are done by the compilers with an overflow case...

Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
Sanhu Li
  • 402
  • 5
  • 11

3 Answers3

9

Your Rust code (without the prints and timing) compiles down to (On Godbolt):

movabs rax, -9223372036854775807
ret

LLVM just const-folds the whole function and calculates the final value for you.

Let's make the upper limit dynamic (non constant) to avoid this aggressive constant folding:

pub fn foo(num: u64) -> u64 {
    let mut sum = 0u64;
    for i in 0..num {
        sum += i;
    }

    sum
}

This results in (Godbolt):

  test rdi, rdi            ; if num == 0
  je .LBB0_1               ; jump to .LBB0_1
  lea rax, [rdi - 1]       ; sum = num - 1
  lea rcx, [rdi - 2]       ; rcx = num - 2
  mul rcx                  ; sum = sum * rcx
  shld rdx, rax, 63        ; rdx = sum / 2
  lea rax, [rdx + rdi]     ; sum = rdx + num
  add rax, -1              ; sum -= 1
  ret
.LBB0_1:
  xor eax, eax             ; sum = 0
  ret

As you can see that optimizer understood that you summed all numbers from 0 to num and replaced your loop with a constant formula: ((num - 1) * (num - 2)) / 2 + num - 1. As for the example above: the optimizer probably first optimized the code into this constant formula and did constant folding then.

Additional notes

  • The two other answers already point out your bug in the C program. When fixed, clang generates exactly the same assembly (unsurprisingly). However, GCC doesn't seem to know about this optimization and generates pretty much the assembly you would expect (a loop).
  • In Rust, an easier and more idiomatic way to write your code would be (0..num).sum(). Despite this using more layers of abstractions (namely, iterators), the compiler generates exactly the same code as above.
  • To print a Duration in Rust, you can use the {:?} format specifier. println!("{:.2?}", d); prints the duration in the most fitting unit with a precision of 2. That's a fine way to print the time for almost all kinds of benchmarks.
trent
  • 25,033
  • 7
  • 51
  • 90
Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
7

Since an int can never be as big as your NUM_LOOP, the program will loop eternally.

const uint64_t NUM_LOOP = 18446744073709551615u;

for (int i = 0; i < NUM_LOOP; ++i) { // Change this to an uint64_t

If you fix the int bug, the compiler will optimize away these loops in both cases.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
visibleman
  • 3,175
  • 1
  • 14
  • 27
  • Sorry, forgot that part. Really appreciate your help. I have another question: if the compiler optimized away the loops, where do the programs get the numbers? Both programs are printing the same number, and I tried to reduce NUM_LOOP by one and the result is 9223372036854775811 from both programs. Considering the overflow, it makes sense. If the loops are optimized out, how can we get the numbers? I also tried NUM_LOOP=1000000000, which will not produce an overflow, and results from both programs are 499999999500000000 in almost no time. How can the programs do that? – Sanhu Li Oct 24 '18 at 09:26
  • 2
    Actually, I was planning to update my answer. The loop can be expressed arithmetically in constant time. And compilers are smart enough to do this. However I tested earlier, and the results are not clear cut on if this optimisation takes place, it depends on compiler version and the value of the NUM_LOOP constant. – visibleman Oct 24 '18 at 10:11
5

Your code is stuck in an infinite loop.

The comparison i < NUM_LOOP will always return true since int i will wrap around before reaching NUM_LOOP

Sandy
  • 895
  • 6
  • 17