The first step is to make your algorithm parallelizable, by making each iteration independent. The first thing I did was add a debug statement to print the final value of k
:
k += 1.0;
}
dbg!(k);
return 1.0 / total;
that printed k = 2
, so I can use that to create a range of k
values independent for each iteration:
(0..=iterations) // [0, 1, 2] for iterations = 2
We'll iterate over the elements in that range, instead of using the epsilon check you have:
pub fn estimate_pi(iterations: usize) -> f64 {
let mut total: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
for i in 0..=iterations {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
}
return 1.0 / total;
}
// call estimate_pi(2)
Total is just a sum of all of the iterations, so we can convert this from a loop into an map-reduce operation. For each number in the range, we calculate the term
. Then, we use fold
(reduce) to calculate the sum.
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
let sum: f64 = (0..=iterations).into_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).fold(0.0, |a, b| a + b);
return 1.0 / sum;
}
Now we can use rayon's methods to convert this into a parallel operation. Replace into_iter()
with into_par_iter()
and fold(0.0, |a, b| a + b)
with reduce(|| 0.0, |a, b| a + b)
:
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
// map is now a parallel map, and reduce is a parallel reduce
let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).reduce(|| 0.0, |a, b| a + b);
return 1.0 / sum;
}
Now to clean up the code a bit to make it more idiomatic:
- remove explicit typing where appropriate
- use implicit returns
- use constant for sqrt(2)
- more meaningful variable names
- embed
396
in the expressions
use std::f64::consts::*;
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
As a last step, we can make factorial
parallel as well:
// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
// notice we now pass the `i: usize` in here
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
Final Code
use rayon::prelude::*;
use std::f64::consts::*;
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
fn main() {
// our algorithm results in the same value as the constant
println!("pi_a: {:.60}", estimate_pi(2));
println!("pi_c: {:.60}", PI);
}
Output
pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000
Playground
Recommendations
You should benchmark different versions of this with different amounts of parallelism to see what is more or less performant. Could be that rayon parallel iterations results in less performance since you have so few total iterations.
You might also consider using a lookup table for factorials since n <= k * 4 <= 8
:
pub fn factorial(n: usize) -> f64 {
const TABLE: [f64; 9] = [
1.0, // 0!
1.0, // 1!
2.0, // 2!
6.0, // 3!
24.0, // 4!
120.0, // 5!
720.0, // 6!
5040.0, // 7!
40320.0, // 8!
];
TABLE[n]
}
Playground
And, of course, enabling inlining can help as well.