10

We are writing a C# application that will help to remove unnecessary data repeaters. A repeater can only be removed in the case that all data it receives are received by other repeaters. What we need as a first step is explained bellow:

I have collection of int arrays, for example

a. {1, 2, 3, 4, 5}

b. {2, 4, 6, 7}

c. {1, 3, 5, 8, 11, 100}

It may be thousands of such arrays. I need to find arrays that can be removed. An array can only be removed in the case that all its numbers are included in other arrays. In the example above, array a can be removed because its numbers 2 and 4 are in array b and numbers 1, 3, 5 are in array c.

What the best way to do such operation?

Community
  • 1
  • 1
genichm
  • 525
  • 7
  • 18
  • 3
    Do you want the minimum or minimal number of arrays left? – harold Dec 02 '14 at 20:06
  • 2
    Does this algo need to be deterministic (i.e; gives the same result whichever the order of operations) ? – M. Page Dec 02 '14 at 20:06
  • Is the data always going to be integers in the range `1`..`100`? – dav_i Dec 02 '14 at 20:06
  • harold - yes, we need minimal number of arrays left. M. Page - yes. dav_i - no, it may be integers bigger that 100, at this moment most common is 6 numbers integers. – genichm Dec 02 '14 at 20:15
  • 1
    @genichm there is a difference, the minimum number of arrays left is a harder problem (Hitting Set), some minimal number of arrays can be obtained by iteratively removing them. – harold Dec 02 '14 at 20:19

2 Answers2

4

This is not optimized solution for minimal number of arrays left.

make the abundance dictionary for the member of arrays. for example:

1 => 2
2 => 2
3 => 2
4 => 2
5 => 2
6 => 1
7 => 1
...

Check each of arrays and if abundance of all members are greater than 1, remove array and reduce the count of each number in your dictionary.

Ali Sepehri.Kh
  • 2,468
  • 2
  • 18
  • 27
4

Getting the minimum number of remaining arrays (as opposed to a subset of arrays where no more arrays can be removed) is the NP-hard set cover problem. Even with thousands of arrays, however, there's a good chance that, if you apply a mixed integer program solver to the formulation in the linked Wikipedia article, it will be able to find the optimal solution.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120