0

To practice Rust I'm implementing a raytracer.

First step I only implemented it only using spheres. I would loop over a Vector of Sphere structs. I could easily increase performance using rayon by doing into_iter_par.

Now I'm trying to add cubes. So instead of looping over Vec<Sphere> I'm looping over Vec<Box<dyn Body>>. This seems to negate the performance gained by using rayon.

How is this performance loss happening and I can I fix it?

In this code snippet I can reproduce the behavior.

use rayon::prelude::*;
use std::f64::consts::PI;
use std::time::Instant;

pub trait Body: Sync + Send {
    fn size(&self) -> f64 {
        0.
    }
}

#[derive(Clone)]
pub struct Sphere {
    radius: f64,
}

impl Sphere {
    pub fn new() -> Self {
        Sphere { radius: 4.0 }
    }
}

impl Body for Sphere {
    fn size(&self) -> f64 {
        4.0 / 3.0 * PI * self.radius * self.radius * self.radius
    }
}
pub struct World {
    bodies: Vec<Box<dyn Body>>,
    spheres: Vec<Sphere>,
}

impl World {
    fn new() -> Self {
        let mut world = World {
            bodies: vec![],
            spheres: vec![],
        };
        for _ in 0..100000000 {
            world.spheres.push(Sphere::new());
            world.bodies.push(Box::new(Sphere::new()));
        }
        world
    }
}

pub fn main() {
    println!("Creating structs");
    let world = World::new();
    println!("Normal,Spheres");
    let timer = Instant::now();
    world.spheres.iter().map(|s| s.size()).sum::<f64>();
    println!("elapsed time:{:?}", timer.elapsed());

    println!("Normal,Bodies");
    let timer = Instant::now();
    world.bodies.iter().map(|s| s.size()).sum::<f64>();
    println!("elapsed time:{:?}", timer.elapsed());

    println!("Parallel,Spheres");
    let timer = Instant::now();
    world.spheres.into_par_iter().map(|s| s.size()).sum::<f64>();
    println!("elapsed time:{:?}", timer.elapsed());

    println!("Parallel,Bodies");
    let timer = Instant::now();
    world.bodies.into_par_iter().map(|s| s.size()).sum::<f64>();
    println!("elapsed time:{:?}", timer.elapsed());
}


When running with the release build, this produces the output:


Creating structs
Normal,Spheres
elapsed time:97ns
Normal,Bodies
elapsed time:323.960404ms
Parallel,Spheres
elapsed time:88.960257ms
Parallel,Bodies
elapsed time:6.015788183s

E_net4
  • 27,810
  • 13
  • 101
  • 139
nicekloki
  • 11
  • 3
  • I can't reproduce this performance issue. After several attempts with different number of spheres/bodies, I couldn't really tell the difference between iterating over `Vec` or `Vec>`. Besides, I get almost the same values when iterating parallelly or not. – jthulhu Mar 26 '23 at 16:01
  • 1
    Just to make sure you are compiling with `cargo build --release` right? – cafce25 Mar 26 '23 at 16:17
  • The output and measurement are from a release build – nicekloki Mar 26 '23 at 16:20
  • 1
    Which performance gain? 97ns is way faster than 88.9ms. – cafce25 Mar 26 '23 at 16:38

1 Answers1

0

You're not using the sum, so the first one is being completely optimized out. You need to do something with it, such as printing:

let sum = world.spheres.iter().map(|s| s.size()).sum::<f64>();
println!("elapsed time:{:?} ({sum})", timer.elapsed());

This benchmark is meaningless.

  1. You're not performing the calculation you are trying to optimize (raytracing). You can't benchmark raytracing by benchmarking a sum.
  2. Raytracers are typically parallel over pixels, not objects. You might be able to make a rasterizer parallel over objects, but it makes little sense for a raytracer.
  3. You aren't comparing equivalent operations. Using Box<dyn Body> will be slower, but it allows you to store different objects, which you aren't utilizing. Compare it with storing an enum, or storing different object types in different Vecs.
  4. The parallel equivalent of iter is par_iter. Using into_par_iter will also drop the items, which is likely taking more time than the sum.
drewtato
  • 6,783
  • 1
  • 12
  • 17
  • 1
    Instead of actually using the return value you could also pass it into [`black_box`](https://doc.rust-lang.org/std/hint/fn.black_box.html) which is designed to disable such optimizations. – cafce25 Mar 26 '23 at 18:12