0

I have struct in Rust with following fields and their types:

  • five i8
  • six f32
  • one bool
  • one Vec of size 4
  • one Vec<Vec<Vec>> of size 4x20x7
  • one Vec<Vec> of size 4x20
  • one HashMap<i8, Vec> with 4 keyes
  • one Vec<(i8, i8)> of size varing from 0 to 30

My question is how to speed up cloning of object of this structure? Now I'm using just

fn py_make_copy(&self) -> PyResult<Self> {
    Ok(self.clone())
}

and it takes from 4 to 6 µs and it is to slow for me. I was measuring it in Python. Unfortunately I have to deepcopy those objects.

I don't know if it is releveant but I use

#[pyclass(subclass)]
#[derive(Clone)]

when defining this Struct and #[pyo3(get)] to all of the fields.

I tried to parallelize cloning with par_iter but it lasted much longer than py_make_copy.

  • 5
    That's quite a lot of heap allocations. Maybe using arrays or something like tinyvec would help. – cdhowie Aug 18 '23 at 22:19
  • 2
    If you don't need or want deep copies for some or all of your objects, there's `Arc` or `Rc` to *share* ownership (so a copy is just an increment). – kmdreko Aug 18 '23 at 22:53
  • 1
    However, if this is used in a Python+Rust interface and 6us is too slow (and thus you're probably calling this a lot), consider crafting your API so objects move cross the boundary less often and you process things on the Rust side in-bulk if at all possible. – kmdreko Aug 18 '23 at 22:53
  • 2
    I think you will already be a lot faster if you transform `Vec>` of size `4x20x7` into a single `Vec` of size `560`. In its current state, it contains `4x20 = 80`!! heap allocations. – Finomnis Aug 18 '23 at 23:21
  • 3
    Also small tip: Questions that say "how can I speed up" or similar are sooo much easier and better to answer if they contain a [MRE]. Maybe consider adding one. – Finomnis Aug 18 '23 at 23:22

1 Answers1

7

I tried to reproduce your claims, and this is what I got:

use std::{collections::HashMap, hint::black_box, time::Instant};

#[derive(Clone)]
pub struct MyStruct {
    pub five_i8: [u8; 5],
    pub six_f32: [f32; 6],
    pub one_bool: bool,
    pub vec1: Vec<u8>,
    pub vec2: Vec<Vec<Vec<u8>>>,
    pub vec3: Vec<Vec<u8>>,
    pub hashmap: HashMap<i8, Vec<u8>>,
    pub vec4: Vec<(i8, i8)>,
}

fn main() {
    let mut s = MyStruct {
        five_i8: [42u8; 5],
        six_f32: [69.420; 6],
        one_bool: true,
        vec1: vec![42u8; 4],
        vec2: Default::default(),
        vec3: Default::default(),
        hashmap: Default::default(),
        vec4: vec![(1i8, 2i8); 30],
    };

    for i in 0..4 {
        let mut x = vec![];
        for _ in 0..20 {
            x.push(vec![1u8; 7]);
        }

        s.vec2.push(x);
        s.vec3.push(vec![42u8; 20]);
        s.hashmap.insert(i, vec![2u8; 10]);
    }

    // Blackbox to prevent optimization
    let s = black_box(s);

    let start = Instant::now();
    for _ in 0..10000 {
        let s2 = s.clone();
        black_box(s2);
    }
    let elapsed = start.elapsed();

    println!("Time: {} us", elapsed.as_micros() / 10000);
}
$ cargo run --release
Time: 17 us

Now why is that so slow? The answer is: Heap allocations. Everything else is very fast.

The only thing in your struct that performs heap allocations are the HashMap and the Vec. Each Vec and HashMap is one heap allocation.

So let's see:

  • vec1: 1 allocation
  • vec2: 1 + 4 * (1 + 20) = 85 allocations
  • vec3: 1 + 4 = 5 allocations
  • hashmap: 1 + 4 = 5 allocations
  • vec4: 1 allocation

That's a total of 97 allocations.

The easiest way is to change the Vec<Vec<Vec<>>> to a single, flattened Vec. This would reduce vec2 and vec3 down to a single allocation, and the total count down to 9 allocations.

Like this:

use std::{collections::HashMap, hint::black_box, time::Instant};

#[derive(Clone)]
pub struct MyStruct {
    pub five_i8: [u8; 5],
    pub six_f32: [f32; 6],
    pub one_bool: bool,
    pub vec1: Vec<u8>,
    pub vec2: Vec<u8>,
    pub vec3: Vec<u8>,
    pub hashmap: HashMap<i8, Vec<u8>>,
    pub vec4: Vec<(i8, i8)>,
}

fn main() {
    let mut s = MyStruct {
        five_i8: [42u8; 5],
        six_f32: [69.420; 6],
        one_bool: true,
        vec1: vec![42u8; 4],
        vec2: vec![10u8; 560],
        vec3: vec![42u8; 80],
        hashmap: Default::default(),
        vec4: vec![(1i8, 2i8); 30],
    };

    for i in 0..4 {
        s.hashmap.insert(i, vec![2u8; 10]);
    }

    // Blackbox to prevent optimization
    let s = black_box(s);

    let start = Instant::now();
    for _ in 0..10000 {
        let s2 = s.clone();
        black_box(s2);
    }
    let elapsed = start.elapsed();

    println!("Time: {} us", elapsed.as_micros() / 10000);
}
$ cargo run --release
Time: 1 us
Finomnis
  • 18,094
  • 1
  • 20
  • 27