3

I have a function that operates on a Vec<T> which purpose is to extend the vector with new items generated using reference to existing items. I'm trying to run the generation of new data in parallel using rayon.

This is a minimal example:

use itertools::Itertools;
use rayon::prelude::*;

fn main() {
    let mut foo = Foo {
        data: (0..1000).into_iter().collect(),
    };
    foo.run();
}

struct Foo<T> {
    data: Vec<T>,
}

type Pair<'a, T> = (&'a T, &'a T);

impl<'a, T: Clone + 'a> Foo<T>
where
    Vec<Pair<'a, T>>: IntoParallelIterator<Item = Pair<'a, T>>,
    [T; 2]: IntoParallelIterator,
    Vec<T>: FromParallelIterator<<[T; 2] as IntoParallelIterator>::Item>,
{
    fn run(&'a mut self) {
        let combinations: Vec<Pair<'a, T>> = self
            .data
            .iter()
            .combinations(2)
            .map(|x| (x[0], x[1]))
            .collect();

        let mut new_combinations: Vec<T> = combinations
            .into_par_iter()
            .flat_map(|(a, b)| bar(a, b))
            .collect();

        self.data.append(&mut new_combinations);
    }
}


fn bar<T: Clone>(a: &T, b: &T) -> [T; 2] {
    [a.clone(), b.clone()]
}

You can find a link to Playground here.

Building the above example raises this error:

error[E0502]: cannot borrow `self.data` as mutable because it is also borrowed as immutable
  --> src/main.rs:36:9
   |
17 |   impl<'a, T: Clone + 'a> Foo<T>
   |        -- lifetime `'a` defined here
...
24 |           let combinations: Vec<Pair<'a, T>> = self
   |  ___________________________----------------___-
   | |                           |
   | |                           type annotation requires that `self.data` is borrowed for `'a`
25 | |             .data
26 | |             .iter()
   | |___________________- immutable borrow occurs here
...
36 |           self.data.append(&mut new_combinations);
   |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

As far as I understand since I am collecting into a let new_combinations: Vec<T> there should be no immutable references to self.data and I should be able in theory to borrow it mutably to append the new combinations. However, it seems that self.data is borrowed for 'a which extends beyond the scope of this method. I cannot find a way to avoid specifying the lifetime fn run(&'a mut self) since I need to specify that the lifetimes of the references to the items of self.data cannot outlive self when creating the combinations.

Is there a way to allow this method to operate as expected, that is: 1) select a list of references to the items in self.data, 2) apply a function that creates new items T in parallel and finally 3) update self.data with the new items.

Note that as a workaround one could return the new_combinations from the method and append them to self.data separately.

Would be great if all of this would be possible by avoiding as many collect() as possible while operating directly with iterators only.

Nick
  • 10,309
  • 21
  • 97
  • 201
  • Incidentally, `0..1000` is an iterator, you don't need `.into_iter()`. See [the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f8c578497ad1754ab8e0ba933bf8b70d). – jthulhu Sep 15 '22 at 14:59
  • *select a list of references to the items in self.data [...] update self.data with the new items* - this is not possible because updating self.data potentially invalidates the references (from borrow checker's POV). – user4815162342 Sep 15 '22 at 15:00
  • 1
    @user4815162342 but `bar()` clones them; the lifetimes are just annotated incorrectly – Finomnis Sep 15 '22 at 15:01
  • *"since I need to specify that the lifetimes of the references to the items of self.data cannot outlive self when creating the combinations.*" - that's not true. You don't need lifetime annotations for Rust to know that. It will figure that out by itself. – Finomnis Sep 15 '22 at 15:30

1 Answers1

3

The elements in new_combinations are cloned and therefore don't borrow from combinations any more. Your annotations, however, state that T: 'a, meaning Rust has to treat them as still borrowed.

I personally think you are being way too excessive with the lifetimes annotations here, you could remove almost all of them. The compiler is very good in figuring them out automatically in most situations.

Further, your trait restrictions were sadly mislead by the compiler hints. They are all automatically fulfilled once you specify T: Clone + Send + Sync.

Here you go:

use itertools::Itertools;
use rayon::prelude::*;

fn main() {
    let mut foo = Foo {
        data: (0..10).collect(),
    };
    foo.run();
    println!("{:?}", foo.data);
}

struct Foo<T> {
    data: Vec<T>,
}

type Pair<'a, T> = (&'a T, &'a T);

impl<T: Clone + Send + Sync> Foo<T> {
    fn run(&mut self) {
        let combinations: Vec<Pair<T>> = self
            .data
            .iter()
            .combinations(2)
            .map(|x| (x[0], x[1]))
            .collect();

        let mut new_combinations: Vec<T> = combinations
            .into_par_iter()
            .flat_map(|(a, b)| bar(a, b))
            .collect();

        self.data.append(&mut new_combinations);
    }
}

fn bar<T: Clone>(a: &T, b: &T) -> [T; 2] {
    [a.clone(), b.clone()]
}
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1, 9, 2, 3, 2, 4, 2, 5, 2, 6, 2, 7, 2, 8, 2, 9, 3, 4, 3, 5, 3, 6, 3, 7, 3, 8, 3, 9, 4, 5, 4, 6, 4, 7, 4, 8, 4, 9, 5, 6, 5, 7, 5, 8, 5, 9, 6, 7, 6, 8, 6, 9, 7, 8, 7, 9, 8, 9]

Further, there is really no need for the Pair type:

use itertools::Itertools;
use rayon::prelude::*;

fn main() {
    let mut foo = Foo {
        data: (0..10).collect(),
    };
    foo.run();
    println!("{:?}", foo.data);
}

struct Foo<T> {
    data: Vec<T>,
}

impl<T: Clone + Send + Sync> Foo<T> {
    fn run(&mut self) {
        let combinations: Vec<_> = self
            .data
            .iter()
            .combinations(2)
            .map(|x| (x[0], x[1]))
            .collect();

        let mut new_combinations: Vec<T> = combinations
            .into_par_iter()
            .flat_map(|(a, b)| bar(a, b))
            .collect();

        self.data.append(&mut new_combinations);
    }
}

fn bar<T: Clone>(a: &T, b: &T) -> [T; 2] {
    [a.clone(), b.clone()]
}

About the last part of your question, the request to remove all .collect() calls if possible:

Sadly, I don't think you will be able to remove any of the collect()s. At least not with your current code layout. You definitely need to collect() between combinations() and into_par_iter(), and you also definitely need to collect() before append, because you need to release all references to self.data before you write into it.

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • 1
    Kudos for figuring out the minimally required trait bounds! When one just removes the multitude of bounds specified by the OP, the compiler complains of missing trait `[(&T, &T)]: Sized`, which bears no hint of `T: Send + Sync` in fact being missing. – user4815162342 Sep 15 '22 at 15:03
  • Yah, in complex situations the compiler hints really struggle. I just knew that parallel computations require `Send + Sync`, for obvious reasons, so I tried it. – Finomnis Sep 15 '22 at 15:05
  • Thank you so much! I was indeed wondering in the beginning why I didn't get complains for missing `Send + Sync` and started following the compiler suggestions up to here. – Nick Sep 15 '22 at 15:18
  • @Nick But as a rule of thumb: Don't annotate lifetimes unless absolutely necessary. You will most certainly annotate them wrong :) – Finomnis Sep 15 '22 at 15:28