Determine unique values across multiple sets

Question

In this project, there are multiple sets in which they hold values from 1 - 9. Within this, I need to efficiently determine if there are values that is unique in one set but not others.

For Example:

std::set<int> s_1 = { 1, 2, 3, 4, 5 };
std::set<int> s_2 = { 2, 3, 4 };
std::set<int> s_3 = { 2, 3, 4, 6 };

Note: The number of sets is unknown until runtime.

As you can see, s_1 contains the unique value of 1 and 5 and s_3 contains the unique value of 6.

After determining the unique values, the aforementioned sets should then just contain the unique values like:

// s_1 { 1, 5 }
// s_2 { 2, 3, 4 }
// s_3 { 6 }

What I've tried so far is to loop through all the sets and record the count of the numbers that have appeared. However I wanted to know if there is a more efficient solution out there.

I don't think that there is a more efficient solution than checking every number in all sets. If you started from s_1, shouldn't 2, 3, and 4 be in s_1 instead of s_2? — o_weisman, Jan 21 '15 at 08:47
2, 3, and 4 are in all sets, shouldn't they appear in no set at the end (i.e. shouldn't set2 be empty)? — Félix Cantournet, Jan 21 '15 at 08:50
All the `sets` get their data individually from one another, but for the purposes of this program, I want to cull non-unique values from sets that contain unique values after I get the data. — Hayden, Jan 21 '15 at 08:50
Cull non-unique values for only from sets that contains at least 1 unique value ? Well you can surely do that with a combination of set_difference and set_symetric_difference. — Félix Cantournet, Jan 21 '15 at 08:54
@FélixCantournet I'll look more into them. Thank you. Can it be done with more than 2 sets at once or only 2 at a time? — Hayden, Jan 21 '15 at 08:56
If you really need to care about efficiency, you might want to benchmark with `std::bitset<10>` (or 9 if you can be bothered adjusting the indices all the time) instead: `bitset` supports efficient bitwise operations. — Tony Delroy, Jan 21 '15 at 09:00
I agree with o_weisman (1st comment). For this specific problem default intersection (with only two arguments) won't help. On the other hand - for large amount of data it perfectly scales horizontally - one thread is looking for uniques in fraction of sets and creates own table of result. Tables are merged (summed up) to get final solution — user2706534, Jan 21 '15 at 09:09
@TonyD would the numbers be represented as a single bit determined by their index in the bitset? — Hayden, Jan 21 '15 at 09:14
@Hayden Yes essentially. Very efficient if you have small contiguous numbers. If you need both 12 and 546,765,876, kinda less efficient. Then again you can use a table to match contiguous indexes with non-contiguous corresponding intergers (or even pretty much any value), although that table must be constructed at runtime, which I suppose has a cost — Félix Cantournet, Jan 21 '15 at 09:20
@FélixCantournet Silly question, but once I've calculated the unique values, how would I represent these values as integers again instead of 1 bit indexes that references the value? — Hayden, Jan 21 '15 at 09:23
@Hayden, actually not that silly hehe. how about `for (int i = 0; i < bitset.size(); i++) { if (bitset[i]) { intset.insert(i)}}` assuming index 0 matches integer 0. Could need an offset. You can think of bitset as `vector` — Félix Cantournet, Jan 21 '15 at 09:35

score 0 · Accepted Answer · edited May 23 '17 at 12:06

0

There are std algorithm in the std C++ library for intersection, difference and union operations on 2 sets.

If I understood well your problem you could do this : do an intersection on all sets (in a loop) to determine a base, and then apply a difference between each set and the base ? You could benchmark this against your current implementation. Should be faster.

Check out this answer.

Getting Union, Intersection, or Difference of Sets in C++

EDIT: cf Tony D. comment : You can basically do the same operation using a std::bitset<> and binary operators (& | etc..), which should be faster. Depending on the actual size of your input, might be well worth a try.

edited May 23 '17 at 12:06

Community

1
1

answered Jan 21 '15 at 08:51

Félix Cantournet

1,941
13
17

I'm going to accept this answer as there is good information here. – Hayden Jan 21 '15 at 09:07
@Hayen - I will tell your solution is better. Using std::set_* functions - makes code look poor and hard to read. Try writing A - A ^ ( B u C ) with those methods and you might prefer your answer itself. – kiranpradeep Jan 21 '15 at 09:09

score 0 · Answer 2 · answered Jan 21 '15 at 08:56

0

I would suggest something in c# like this

Dictionary<int, int> result = new Dictionary<int, int>();
foreach(int i in sets){
   if(!result.containskey(i))
      result.add(i,1);
   else
      result[i].value = result[i].value+1;
}

now the Numbers with count value only 1 means its unique, then find the sets with these numbers...

answered Jan 21 '15 at 08:56

John Youssef

13
6

That's how I essentially wrote my initial solution up. – Hayden Jan 21 '15 at 08:57

score 0 · Answer 3 · answered Jan 21 '15 at 09:26

I would suggest :

start inserting all the elements in all the sets into a multimap.
Here each element is a key and and the set name with be the value.
One your multimap is filled with all the elements in all the sets, then loop throgth the multimap and take count of each element in the multimap.
If the count is 1 for any key, this means its unique and value of that will be the set name.

Determine unique values across multiple sets

3 Answers3