3

Suppose C refers to a set of containers {c1,c2,c3....cn}, where each of these containers contains a finite set of integers {i1,i2,i3...im}. Further, suppose that it is possible for an integer to exist in more than one container. Given a finite set of integers S {s1,s2,s3...sz}, find the size of the smallest subset of C that contains all integers in S.

Note that there could be thousands of containers each with hundreds of integers. Therefore, brute force is slow for solving this problem.

I tried to solve the problem using Greedy algorithm. That is, each time I select the container with the largest number of integers in the set S, but I failed!

Can anyone suggest a fast algorithm for this problem?

David Robinson
  • 77,383
  • 16
  • 167
  • 187
Traveling Salesman
  • 2,209
  • 11
  • 46
  • 83
  • 1
    How does the algorithm relate to bioinformatics? – David Robinson Aug 25 '12 at 16:28
  • 2
    This is the well known [set cover problem](http://en.wikipedia.org/wiki/Set_cover_problem). It is NP-complete, and so no efficient algorithm is known. The greedy algorithm does as well as can be done (unless P=NP). – Gareth Rees Aug 25 '12 at 16:32
  • I am trying to find the smallest window size of GO terms that contains all the given genes....For the sake of simplicity, I used integers and containers. – Traveling Salesman Aug 25 '12 at 16:33
  • @GarethRees: This would be a good answer. – David Robinson Aug 25 '12 at 16:33
  • @TravelingSalesman: that makes sense, but since it doesn't relate to the question directly I removed the tag. I'm sorry you had to come across an NP problem like that in your research (I also work in bioinformatics)- it's happened to everyone! – David Robinson Aug 25 '12 at 16:34
  • thanks Gareth Rees...I will read about it. – Traveling Salesman Aug 25 '12 at 16:35
  • @TravelingSalesman: When you say the greedy algorithm failed, do you mean that it didn't find the optimal solution, that it was slow, or that you weren't able to implement it? – David Robinson Aug 25 '12 at 16:37
  • by "failed", I mean it gave me a wrong output. There is a case where this algorithm fails. "The one I mentioned". – Traveling Salesman Aug 25 '12 at 16:45

1 Answers1

7

This is the well known set cover problem. It is NP-hard — in fact, its decision version was one of the canonical NP-complete problems and was among the 21 problems included in Karp's 1972 paper — and so no efficient algorithm is known. Unless you can identify some special extra structure to the problem, you will have to be satisfied with an approximate result: that is, a subset of C whose union contains S, which but which is not necessarily the smallest such subset of C.

The greedy algorithm is probably your best bet: it finds a collection of sets that is no more than O(log |C|) times the size of the smallest such collection.

You say that you were unable to get the greedy algorithm to work. I think this is probably because you failed to implement it correctly. You describe your algorithm like this:

each time I select the container with the largest number of integers in the set S

but the rule in the usual greedy algorithm is to select at each stage the container with the largest number of integers in the set S that are not in any container selected so far.

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
  • Maybe there is something that I am missing here, suppose we have S = {1,2,3,4,5,6} and we have 3 containers, c1= {1,3,4,6}, c2={1,2,3}, c3={4,5,6} Implementing the greedy algorithm...I will pick in the first stage c1 because it's the largest with uncovered elements....After that I will have to take c2 and then c3. This is obviously not the right solution....because I can take c2 and c3 only....what do you think? – Traveling Salesman Aug 25 '12 at 17:09
  • Since the problem is NP-complete, you cannot expect to compute the smallest cover in a reasonable amount of time. The greedy algorithm is therefore an *approximate algorithm*: it computes *a* cover, but not always the *smallest* cover. If it is important to you that you find the smallest cover, then you are out of luck. – Gareth Rees Aug 25 '12 at 17:28
  • Thanks Gareth Rees for your help. – Traveling Salesman Aug 25 '12 at 17:41