The code is naive:
use std::time;
fn main() {
const NUM_LOOP: u64 = std::u64::MAX;
let mut sum = 0u64;
let now = time::Instant::now();
for i in 0..NUM_LOOP {
sum += i;
}
let d = now.elapsed();
println!("{}", sum);
println!("loop: {}.{:09}s", d.as_secs(), d.subsec_nanos());
}
The output is:
$ ./test.rs.out
9223372036854775809
loop: 0.000000060s
$ ./test.rs.out
9223372036854775809
loop: 0.000000052s
$ ./test.rs.out
9223372036854775809
loop: 0.000000045s
$ ./test.rs.out
9223372036854775809
loop: 0.000000041s
$ ./test.rs.out
9223372036854775809
loop: 0.000000046s
$ ./test.rs.out
9223372036854775809
loop: 0.000000047s
$ ./test.rs.out
9223372036854775809
loop: 0.000000045s
The program almost ends immediately. I also wrote an equivalent code in C using for loop, but it ran for a long time. I'm wondering what makes the Rust code so fast.
The C code:
#include <stdint.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
double time_elapse(struct timespec start) {
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
return now.tv_sec - start.tv_sec +
(now.tv_nsec - start.tv_nsec) / 1000000000.;
}
int main() {
const uint64_t NUM_LOOP = 18446744073709551615u;
uint64_t sum = 0;
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
for (int i = 0; i < NUM_LOOP; ++i) {
sum += i;
}
double t = time_elapse(now);
printf("value of sum is: %llu\n", sum);
printf("time elapse is: %lf sec\n", t);
return 0;
}
The Rust code is compiled using -O
and the C code is compiled using -O3
. The C code is running so slow that it hasn't stopped yet.
After fixing the bug found by visibleman and Sandeep, both programs were printing the same number in almost no time. I tried to reduce NUM_LOOP
by one, results seemed reasonable considering an overflow. Moreover, with NUM_LOOP = 1000000000
, both programs will not have overflow and produce correct answers in no time. What optimizations are used here? I know we can use simple equations like (0 + NUM_LOOP - 1) * NUM_LOOP / 2
to compute the result, but I don't think such computations are done by the compilers with an overflow case...