6

Here is a simple example demonstrating what I'm trying to do:

use std::collections::HashSet;

fn main() {
    let mut sets: Vec<HashSet<char>> = vec![];

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('c');
    set.insert('d');
    sets.push(set);

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('d');
    set.insert('e');
    sets.push(set);

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('f');
    set.insert('g');
    sets.push(set);

    // Simple intersection of two sets
    let simple_intersection = sets[0].intersection(&sets[1]);
    println!("Intersection of 0 and 1: {:?}", simple_intersection);

    let mut iter = sets.iter();
    let base = iter.next().unwrap().clone();
    let intersection = iter.fold(base, |acc, set| acc.intersection(set).map(|x| x.clone()).collect());
    println!("Intersection of all: {:?}", intersection);
}

This solution uses fold to "accumulate" the intersection, using the first element as the initial value.

Intersections are lazy iterators which iterate through references to the involved sets. Since the accumulator has to have the same type as the first element, we have to clone each set's elements. We can't make a set of owned data from references without cloning. I think I understand this.

For example, this doesn't work:

let mut iter = sets.iter();
let mut base = iter.next().unwrap();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
println!("Intersection of all: {:?}", intersection);

error[E0277]: a value of type `&HashSet<char>` cannot be built from an iterator over elements of type `&char`
  --> src/main.rs:41:73
   |
41 |     let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
   |                                                                         ^^^^^^^ value of type `&HashSet<char>` cannot be built from `std::iter::Iterator<Item=&char>`
   |
   = help: the trait `FromIterator<&char>` is not implemented for `&HashSet<char>`

Even understanding this, I still don't want to clone the data. In theory it shouldn't be necessary, I have the data in the original vector, I should be able to work with references. That would speed up my algorithm a lot. This is a purely academic pursuit, so I am interested in getting it to be as fast as possible.

To do this, I would need to accumulate in a HashSet<&char>s, but I can't do that because I can't intersect a HashSet<&char> with a HashSet<char> in the closure. So it seems like I'm stuck. Is there any way to do this?

Alternatively, I could make a set of references for each set in the vector, but that doesn't really seem much better. Would it even work? I might run into the same problem but with double references instead.

Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set. I can't figure out how to make this happen, since I have to go through intersection which gives me references.

Are any of the above proposals possible? Is there some other zero copy solution that I'm not seeing?

TechnoSam
  • 578
  • 1
  • 8
  • 23

3 Answers3

4

Finally, I don't actually need to retain the original data.

This makes it really easy.

First, optionally sort the sets by size. Then:

let (intersection, others) = sets.split_at_mut(1);
let intersection = &mut intersection[0];
for other in others {
    intersection.retain(|e| other.contains(e));
}
orlp
  • 112,504
  • 36
  • 218
  • 315
3

Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set.

The retain method will work perfectly for your requirements then:

fn intersection(mut sets: Vec<HashSet<char>>) -> HashSet<char> {
    if sets.is_empty() {
        return HashSet::new();
    }
    
    if sets.len() == 1 {
        return sets.pop().unwrap();
    }
    
    let mut result = sets.pop().unwrap();
    result.retain(|item| {
        sets.iter().all(|set| set.contains(item))
    });
    result
}

playground

pretzelhammer
  • 13,874
  • 15
  • 47
  • 98
3

You can do it in a fully lazy way using filter and all:

sets[0].iter().filter (move |c| sets[1..].iter().all (|s| s.contains (c)))

Playground

Jmb
  • 18,893
  • 2
  • 28
  • 55