Rayon: why do the two ways of sum give different results?

Question

Here I use Rust rayon crate to compute the definite integral (left Riemann sum) of a function in parallel:

use rayon::prelude::*;
use rayon::ThreadPoolBuilder;
use std::f64::consts::PI;

fn integral_rayon_test<F>(segments: u32, x0: f64, xn: f64, function: F)
where
    F: Fn(f64) -> f64 + Send + Sync + Copy + 'static,
{
    let range = xn - x0;
    let d = range / segments as f64;

    let sum1 = (0..segments)
        .into_par_iter()
        .map(|x| function(x0 + x as f64 * d) * d)
        .collect::<Vec<_>>()
        .iter()
        .sum::<f64>();

    let sum2 = (0..segments)
        .into_par_iter()
        .map(|x| function(x0 + x as f64 * d) * d)
        .sum::<f64>();

    println!("{} {}", sum1, sum2);
}

fn main() {
    ThreadPoolBuilder::new()
        .num_threads(3)
        .build_global()
        .unwrap();

    for _ in 0..1000 {
        integral_rayon_test(100000, 0.0, 2.0 * PI, f64::sin);
    }
}

There are two methods I compute the integral: sum1 is to first collect the par_iter values, and then sum them up; sum2 is to directly call sum on par_iter.

Then I run the test function integral_rayon_test multiple times, it produces these outputs:

-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000011102230246251565
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
...

Then I have two questions:

First, I expect that sum1 (column 1) be equal to sum2 (column 2), but they are not.

Second, as you can see, the sum1 numbers are all the same, but each sum2 value has a slight difference. I also expect them to be the same. This looks like data racing, but I don't think that because rayon claims its data race free. I don't know what I'm missing.

When I change the thread number to 1, it outputs:

-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
...

where sum2 values are the same, but still sum1 differs from sum2. I don't know why.

Thanks for the help!

UPDATE: Test target is x86_64-unknown-linux-gnu

/proc/cpuinfo shows fpu and fpu_exception true

For the second question, I just ignored that IEEE-754 floating-point numbers are not commutative, and this may be the cause.

I tested on aarch64-linux-android target, with thread number 1, it outputs:

-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
...

where all sum1 and sum2 are the same. But this result is different from the above -0.00000000000000020519185459954437 0 output.

Can this be related to hardware floating point arithmetic? Now I have some trouble compiling the test program with +soft-float target feature, and cannot do some further tests.

Operations on IEEE 754 Floating-Point Numbers (`f32` and `f64`) are not commutative. The order matters. — PitaJ, Dec 10 '22 at 05:28
All of these numbers are effectively zero and are well within the rounding error of floating point operations on numbers of magnitude roughly `1 / 100000`, as determined by your choice of `segments` — BallpointBen, Dec 10 '22 at 05:30
@PitaJ Oh yes, I just ignored it, and this makes sense. Now case 1 is okay, but for the second question (the 1-thread case), these numbers are added sequentially, but still `num1` and `num2` is not equal. — bczhc, Dec 10 '22 at 05:51
Rayon is work-stealing. The order it will execute is not necessarily in the order operations are submitted, or even deterministic. — PitaJ, Dec 10 '22 at 05:55
On what hardware do you run the program? If this uses the x87 FPU (I don't remember which FPU Rust uses by default on x86-64 CPUs), then it's possible that `sum2` will do the whole computation with 80 bits of precision and only round to 64 bits at the end, whereas `sum1` will round each value to 64 bits when collecting the vector. — Jmb, Dec 10 '22 at 06:52

Rayon: why do the two ways of sum give different results?

0 Answers0