5

What is the best algorithm to find the sets in a finite collection of sets that are a subset of a specific set?

For example, if

A = {1, 2}
B = {2, 3, 4}
C = {3, 5}
D = {6}

and X = {1, 2, 3, 5}

Then, A and C are subsets of X.

Is there an algorithm that I could do this in linear time complexity?

Implementation Note: The members of the sets are generally from a very limited range, therefore, it could be a good idea to use C++ bitset to implement the algorithm. Couldn't it?

Edit: The number of sets in the collection is generally very greater than The number of elements in X (in the example). Is there a way to do this linear in terms of the number of elements in X? Probably using hash or something?

Mohammad
  • 117
  • 2
  • 8
  • No way to do this in true linear time. Testing if a set contains another set will always be technically quadratic time, *but* using a hashtable will make such a problem be linear time in practice (if the sets are of a reasonable length). So the answer to your question is that the time complexity will be `M*N*Q`, if M is the number of sets (A-D), N is the size of the largest of those sets, and Q is the size of the set X. – David Robinson Sep 24 '12 at 06:35
  • Could you give me a link or probably the name of the algorithm? – Mohammad Sep 24 '12 at 06:46
  • It's not the algorithm that's important so much as the data structure. As you (and @amit) mention, a bitset is useful if you have a limited number of possible elements. A [hash table](http://en.wikipedia.org/wiki/Hash_table) is another very useful data structure (in C++ it's referred to as an [unordered_map](http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B%29). – David Robinson Sep 24 '12 at 06:48

2 Answers2

7

Let's assume for a moment 64 possible elements.

Then, if you represent each element as a bit, you can use a 64 bits long integer to represent each set, and then: a & b is the set intersection of a and b.
If (and only if) a is a subset of b then a & b == a.

Of course you can use a bitset if you need more then 64 bits.

For large range of elements, using a hash table to store (once) the superset, and then iterating the potential subsets to check if all elements are in it can be done.
It is linear in the input size (average case).


EDIT: (response to editted question)

Unless you pre-stored some information on the data - it cannot be done betetr then O(|X| + n*min{m,|X|}) Where |X| is the size of the set X, n is the number of sets, and m is the average size of the sets.
The reason for this is becasue at worst case, you need to read all elements in all set (because the last element you read for each set decides if it is a subset or not), and thus we cannot achieve better without previous knowledge on the sets.

The suggested solutions are:
Bitset: O(|X|*n)
Hash solution: O(|X| + min{m,|X|}*n) (average case)

Although the hash solution provides better asymptotic complexity, the constants are much better for a bitset- and thus the bitset solution will probably be faster for small |X|

amit
  • 175,853
  • 27
  • 231
  • 333
  • Your answer is correct. But what if the number of sets in the collection is very large but the number of elements in X is small? Can I do this more efficiently? – Mohammad Sep 24 '12 at 06:43
  • @Mohammad: Editted the answer. – amit Sep 24 '12 at 06:46
1

If you are not limited in time for building up some extra structures, the O(log(n)) solution would be to store the bit sequences which represent individuals sets in a Trie.

You don't have to compare you set (a.k.a. bitstring) against all the other sets as Amit supposes. If you have a sorted collection of bitstrings, then each comparison obviously decreases the number of variants in half. Yes, of course, the time to build the bitset trie is something like O(n*log(n)), but it is a preprocessing.

Viktor Latypov
  • 14,289
  • 3
  • 40
  • 55
  • How could a sorted collection of bitstrings decreases the number of comparisons in half? e.g. if X={3,6} and 3 is in one of the sets, 6 could still be in another one without 3? How can we put sets in Trie? If we wanted to search for sets in the Trie the time complexity was good. But we have to calculate intersection between X and each item in Trie. How could Trie help us to find sets' intersection? – Mohammad Sep 24 '12 at 13:43