1

I would like to efficiently index into an ndarray using a boolean mask. To better convey what I mean I have some working numpy code and then my attempt in rust ndarray which works but is extremely inefficient.

Numpy:

import numpy as np

shape = (100, 100, 100)

grouping_array = np.random.randint(0, 100, size=shape)
data_array = np.random.rand(*shape)

for i in range(1, 100):
    ith_mean = data_array[grouping_array == i].mean()
    print(ith_mean)

Rust ndarray:

fn group_means(
    data: &Array<f32, IxDyn>,
    grouping_var: &Array<f32, IxDyn>,
    n_groups: i32,
) {
    
    for group in 1..n_groups {
        
        let index_array = grouping_var.mapv(|x| x == roi as f32);
        let roi_data = Array::from_iter(
            image_data
            .iter()
            .zip(index_array.iter())
            .map(|(x, y)| if *y { *x } else { 0. })
        );
        
        let mean_roi = roi_data.mean().unwrap();
        println!("group {}; mean {}", group, mean_roi);
    
    }    
}

Here each iteration in the n_groups loop takes about as long as the whole numpy script which is done in less than a second. Is there a better way to do this in the rust-ndarray version?

Leo
  • 11
  • 2
  • 1
    This is the obligatory 'did you run with `--release` comment for the performance test' comment. – cafce25 Dec 28 '22 at 18:07
  • I am sorry, yes that was the main problem. However, even with `--release` and after installing using `cargo install --path .`, it is still noticeably slower than the numpy version (between 1 or 2 seconds slower). That can be for a number of reasons I guess, but thought if there is a more efficient/idiomatic way of doing it in rust, that will be good. But yeah, with the compiler optimisations even this will be good enough for now. Thanks for the quick answer! – Leo Dec 28 '22 at 18:29
  • Do you feel this is answering the question enough, or are you still seeking a faster alternative? If you don't, I can close this as a duplicate of the canonical [Why is my Rust program slower than the equivalent Java program?](https://stackoverflow.com/questions/25255736/why-is-my-rust-program-slower-than-the-equivalent-java-program) – Chayim Friedman Dec 28 '22 at 18:39
  • 1
    I think i would still like to see if someone as a better way of doing the slicing as this is still quite a bit slower even with optimisations. – Leo Dec 28 '22 at 18:42
  • I don't know if numpy relies on the current processor to perform optimizations, but try setting the environment variable `RUSTFLAGS="-Ctarget-cpu=native"` before building (you don't need `cargo install`). – Chayim Friedman Dec 28 '22 at 18:45
  • thanks! tried that out now, did not really change the performance. I think my code is just not as efficient as it could be, but its probably not an easy solution. Will have to do some more systematic performance testing. – Leo Dec 28 '22 at 19:12
  • 1
    Are these two pieces of code even equivalent? Does the numpy version consider values that line up with a "false" to be zero? Or are they omitted entirely? The Rust version considers them to be zero, but if the Python code omits them, this will produce a different result when you ask for the mean. – cdhowie Dec 28 '22 at 19:25
  • And i just realised my rust code isnt even correct in terms of calculating the mean since i include the zeros in the elements in the mean calculation even though they shouldnt be – Leo Dec 28 '22 at 19:29
  • Yes you are right @cdhowie i have to get back to the drawing board sorry. I just cannot for the life of me do this indexing correctly in ndarray – Leo Dec 28 '22 at 19:30
  • @Leo Using filter_map instead of map should be sufficient. You just return `None` instead of 0, and `Some(*x)` instead of `*x`. – cdhowie Dec 28 '22 at 19:44
  • ok yes that works, at least its correct then for now, thank you! – Leo Dec 28 '22 at 19:52

1 Answers1

0

This is likely not a surprise to others, but since my grouping_var array should (in my use case) always be 3D array, I changed its type (and therefore also index_array) from &Array<f32, IxDyn> to &Array<f32, Ix3> which dramatically improved performance.

Leo
  • 11
  • 2