finding the smallest array that intersects with a set of arrays

Question

Say I have three arrays - ['a', 'b'], ['b', 'c'] and ['d']. If I were to create a fourth array that intersects with these three arrays with a minimal number of elements the array I'd get would be ['b', 'd']. My question is... how would I go about finding this array?

Like ['a', 'b', 'c', 'd'] certainly intersects with all the arrays but it's not the minimal intersection - ['b', 'd'] is.

Any ideas?

Isn't that just taking the intersection of your original 3 sets? — bhspencer, Jul 23 '15 at 21:28
@bhspencer define intersection (what does 'd' intersect with?) — Amit, Jul 23 '15 at 21:29
@bhspencer i think he wants a array the intersects with any of the arrays. Very interesting — George, Jul 23 '15 at 21:31
I'm almost certain this is NP-Complete and will require a brute force solution. This almost fits perfectly to [Clique problem](https://en.wikipedia.org/wiki/Clique_problem) — Amit, Jul 23 '15 at 22:02

Tom Tseng · Answer 1 · 2015-07-23T22:16:50.130

3

I don't have a good answer for an algorithm, but it is true that, like commenter Amit wrote, this is an NP-complete problem. This problem is called the hitting set problem, which is equivalent to the set cover problem.

If you're fine with approximation, then according to the wiki article, greedily picking the elements that hit the most arrays as about as good as it gets.

edited Jul 23 '15 at 22:16

answered Jul 23 '15 at 22:13

Tom Tseng

156
1
5

I just couldn't find the correct problem type... and there it was, so classic :-) – Amit Jul 23 '15 at 22:15
Great article. Thanks for the link Tom Tseng. – John Jul 23 '15 at 22:26

Jared Price · Answer 2 · 2015-07-23T22:28:44.963

1

I think what you might want to try is going through each array to grab values that match more than one array. Then once you have those values, determine which values you can remove from the array.

Example:

[1,2] [2,3] [2,4] [1,5] [3,7] [4,8]

After looping through, we find that [1,2,3,4] are all values which match in more than one array.

Now we must determine if there are any values we can eliminate from this list.

If we eliminate 1, will everything still match?

No, we need 1.

If we eliminate 2, will everything still match?

Yes, we can eliminate 2 from the array. Now we have [1,3,4].

If we eliminate 3, will everything still match?

No, we need 3.

If we eliminate 4, will everything still match?

No, we need 4.

Our final array is [1,3,4].

This will not work if you have a completely unique array. To account for this, you could create a boolean array of all false values and set values to true as you match arrays. Any value that is still false in the end, you would have to pick a value from that array.

edited Jul 23 '15 at 22:28

answered Jul 23 '15 at 22:05

Jared Price

5,217
7
44
74

Sorry to put you down again, but really, this is not solvable. You can only come up with a method (I'm intentionally not calling this an algorithm) that feels like it's working, but with the correct input it will fail (like I showed you in your previous attempt). Again, sorry... – Amit Jul 23 '15 at 22:17
So you're saying there isn't a correct answer? Could you show me an example where this wouldn't work? – Jared Price Jul 23 '15 at 22:19
Nevermind, this wouldn't work if you had a completely unique array. – Jared Price Jul 23 '15 at 22:23
1

What I'm saying (and to be more accurate, what the world of set theory says) is that this is defined as NP-Complete. You can't guarantee a correct answer and at the same time guarantee efficiency better than brute force (test all possible answers). Whether I can (or have the time to) come up with an input that breaks your current method is irrelevant – Amit Jul 23 '15 at 22:23

finding the smallest array that intersects with a set of arrays

2 Answers2