Here I use Rust rayon
crate to compute the definite integral (left Riemann sum) of a function in parallel:
use rayon::prelude::*;
use rayon::ThreadPoolBuilder;
use std::f64::consts::PI;
fn integral_rayon_test<F>(segments: u32, x0: f64, xn: f64, function: F)
where
F: Fn(f64) -> f64 + Send + Sync + Copy + 'static,
{
let range = xn - x0;
let d = range / segments as f64;
let sum1 = (0..segments)
.into_par_iter()
.map(|x| function(x0 + x as f64 * d) * d)
.collect::<Vec<_>>()
.iter()
.sum::<f64>();
let sum2 = (0..segments)
.into_par_iter()
.map(|x| function(x0 + x as f64 * d) * d)
.sum::<f64>();
println!("{} {}", sum1, sum2);
}
fn main() {
ThreadPoolBuilder::new()
.num_threads(3)
.build_global()
.unwrap();
for _ in 0..1000 {
integral_rayon_test(100000, 0.0, 2.0 * PI, f64::sin);
}
}
There are two methods I compute the integral: sum1
is to first collect the par_iter
values, and then sum them up; sum2
is to directly call sum
on par_iter
.
Then I run the test function integral_rayon_test
multiple times, it produces these outputs:
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000011102230246251565
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
-0.00000000000000020519185459954437 -0.0000000000000002220446049250313
-0.00000000000000020519185459954437 0.0000000000000006661338147750939
...
Then I have two questions:
First, I expect that sum1
(column 1) be equal to sum2
(column 2), but they are not.
Second, as you can see, the sum1
numbers are all the same, but each sum2
value has a slight difference. I also expect them to be the same. This looks like data racing, but I don't think that because rayon claims its data race free. I don't know what I'm missing.
When I change the thread number to 1, it outputs:
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
-0.00000000000000020519185459954437 0
...
where sum2
values are the same, but still sum1
differs from sum2
. I don't know why.
Thanks for the help!
UPDATE:
Test target is x86_64-unknown-linux-gnu
/proc/cpuinfo shows fpu and fpu_exception true
For the second question, I just ignored that IEEE-754 floating-point numbers are not commutative, and this may be the cause.
I tested on aarch64-linux-android
target, with thread number 1, it outputs:
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
-0.0000000000000006116797606620095 -0.0000000000000002220446049250313
...
where all sum1
and sum2
are the same. But this result is different from the above -0.00000000000000020519185459954437 0
output.
Can this be related to hardware floating point arithmetic? Now I have some trouble compiling the test program with +soft-float
target feature, and cannot do some further tests.