How to use Rayon for parallel calculation of PI

Question

My code can calculate Pi using Ramanujan's formula without Rayon and I want to implement Rayon for parallel threading as this is my project.

I know that I need to use this

use rayon::prelude::*;
fn sum_of_squares(input: &[f64]) ->f64 {
    for i in total.iter() // <-- just change that!
         .map(|&i| i * i)
         .sum()
}

but I still don't understand what to do.

Here is my code

use rayon::prelude::*;

pub fn factorial(n: f64) -> f64 {
    if n == 0.0 {
        return 1.00;
    } else {
        let result: f64 = factorial(n - 1.0) * n;

        return result;
    }
}

pub fn estimate_pi() -> f64 {
    let mut total: f64 = 0.0;
    let mut k: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    loop {
        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;

        if term.abs() < 1e-15 {
            break;
        }

        k += 1.0;
    }

    return 1.0 / total;
}

fn main() {
    println!("{}", estimate_pi());
}

Playground

PitaJ · Accepted Answer · 2019-12-08T07:46:17.973

The first step is to make your algorithm parallelizable, by making each iteration independent. The first thing I did was add a debug statement to print the final value of k:

        k += 1.0;
    }

    dbg!(k);

    return 1.0 / total;

that printed k = 2, so I can use that to create a range of k values independent for each iteration:

(0..=iterations) // [0, 1, 2] for iterations = 2

We'll iterate over the elements in that range, instead of using the epsilon check you have:

pub fn estimate_pi(iterations: usize) -> f64 {
    let mut total: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    for i in 0..=iterations {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;
    }

    return 1.0 / total;
}

// call estimate_pi(2)

Total is just a sum of all of the iterations, so we can convert this from a loop into an map-reduce operation. For each number in the range, we calculate the term. Then, we use fold (reduce) to calculate the sum.

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    let sum: f64 = (0..=iterations).into_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).fold(0.0, |a, b| a + b);

    return 1.0 / sum;
}

Now we can use rayon's methods to convert this into a parallel operation. Replace into_iter() with into_par_iter() and fold(0.0, |a, b| a + b) with reduce(|| 0.0, |a, b| a + b):

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    // map is now a parallel map, and reduce is a parallel reduce
    let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).reduce(|| 0.0, |a, b| a + b);

    return 1.0 / sum;
}

Now to clean up the code a bit to make it more idiomatic:

remove explicit typing where appropriate
use implicit returns
use constant for sqrt(2)
more meaningful variable names
embed 396 in the expressions

use std::f64::consts::*;

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

As a last step, we can make factorial parallel as well:

// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        // notice we now pass the `i: usize` in here
        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

Final Code

use rayon::prelude::*;
use std::f64::consts::*;

pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

fn main() {
    // our algorithm results in the same value as the constant
    println!("pi_a: {:.60}", estimate_pi(2));
    println!("pi_c: {:.60}", PI);
}

Output

pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000

Playground

Recommendations

You should benchmark different versions of this with different amounts of parallelism to see what is more or less performant. Could be that rayon parallel iterations results in less performance since you have so few total iterations.

You might also consider using a lookup table for factorials since n <= k * 4 <= 8:

pub fn factorial(n: usize) -> f64 {
    const TABLE: [f64; 9] = [
        1.0,     // 0!
        1.0,     // 1!
        2.0,     // 2!
        6.0,     // 3!
        24.0,    // 4!
        120.0,   // 5!
        720.0,   // 6!
        5040.0,  // 7!
        40320.0, // 8!
    ];

    TABLE[n]
}

Playground

And, of course, enabling inlining can help as well.

60 is an absurd precision to print an `f64` with; they only have about 16 to start with. — trent, Dec 08 '19 at 08:50
I honestly don't know how to thank you enough. God bless . You just saved my life — Adil Roubleh, Dec 08 '19 at 12:41
@trentcl I just picked a number where I could see all the zeroes at the end — PitaJ, Dec 08 '19 at 17:22

How to use Rayon for parallel calculation of PI

1 Answers1

Final Code

Recommendations