How to minimize clones for implementation of NdArray

Question

I have a general n-dimensional array type I implemented with:

pub struct NdArray<T: Clone, const N: usize> {
    pub shape: [usize; N],
    pub data: Vec<T>,
}

impl<T: Clone, const N: usize> NdArray<T, N> {
    // Creates a new NdArray from an array of
    // values with a given shape
    pub fn new(array: &[T], shape: [usize; N]) -> Self {
        NdArray {
            shape,
            data: array.to_vec(),
        }
    }

    // Creates a new NdArray with a `Vec` of
    // values with a given shape
    pub fn from(array: Vec<T>, shape: [usize; N]) -> Self {
        NdArray { shape, data: array }
    }
}

I am now implementing arithmetic operations for it (e.g. addition, subtraction, multiplication). The problem is that my implementation uses a lot of clones, such as this one for elementwise addition:

impl<T: Clone + Add<Output = T>, const N: usize> Add<&NdArray<T, N>> for &NdArray<T, N> {
    type Output = NdArray<T, N>;

    fn add(self, rhs: &NdArray<T, N>) -> Self::Output {
        assert_eq!(self.shape, rhs.shape);

        let sum_vec = self
            .data
            .iter()
            .zip(&rhs.data)
            .map(|(a, b)| a.clone() + b.clone()) 
            .collect();

        NdArray::from(sum_vec, self.shape)
    }
}

And the same with several other operations:

    pub fn max(&self) -> T
    where
        T: Ord,
    {
        self.data.iter().max().unwrap().clone()
    }

    pub fn min(&self) -> T
    where
        T: Ord,
    {
        self.data.iter().min().unwrap().clone()
    }

    pub fn sum(&self) -> T
    where
        T: Clone + Sum,
    {
        self.data.iter().cloned().sum()
    }

    pub fn product(&self) -> T
    where
        T: Clone + Product,
    {
        self.data.iter().cloned().product()
    }

This makes it very slow when handling large arrays, which is problematic.

I've already tried to fix this by using a blas library for doing certain array operations, but there isn't a BLAS function for many elementwise ops (e.g. addition) so I'm pretty lost as to what to do.

How can I minimize clone usage in my ndarray library?

And to answer the obvious: yes, I know ndarray exists, this is an educational exercise.

You can't avoid the clone with the signature `&NdArray + &NdArray = NdArray`. You can take, for example, `NdArray + &NdArray = NdArray` and reuse the left one. — Chayim Friedman, Jul 20 '23 at 11:34
It'll help if you'll give examples of the types you're using for which clones are expensive. If those are simple integers or floats clones shouldn't be a problem. — Chayim Friedman, Jul 20 '23 at 11:35
@ChayimFriedman I was under the impression that any clone would be expensive with very large arrays, even tho my use case is mostly for NdArrays of f64s — JS4137, Jul 20 '23 at 12:40
But you're cloning the _elements_, not the array. Cloning primitives is always very cheap. Of course, you may have a performance problem because you're recreating the array, but that's a different problem. — Chayim Friedman, Jul 20 '23 at 13:03

How to minimize clones for implementation of NdArray

0 Answers0