2

I have a big set of comparable things (call them widgets), and I need to choose a subset of them. The widgets can be conflicted with other widgets, and they can be "compared"[1] to see which is "better" to resolve the conflicts. I want to choose the best subset that contains as many non-conflicting widgets as possible.

I want my choice to be robust and deterministic - if I add some irrelevant widgets that conflict with lots of others, I don't want these to change the result compared to if they hadn't been included - this is what I mean by finding the "deterministic best" subset, a unique one, or at worst the same one every time out a set of subsets that are equivalent.

My implementation looks something like this:

Build a graph of widgets, where edges are conflicts between widgets.

While there are edges in the conflict graph:

    Make a list of the edges in the conflict graph, sorted so that the most
    conflicted widgets are at the front of the list.

    Pick the first edge in the list, compare the two, delete the loser. (note:
    this step includes tidying up the graph, which is why we need to refresh
    the list of conflict edges)

Make a list, possibles, of deleted widgets that don't conflict with any in the
final set.

Call this procedure recursively on possibles in case they're conflicted with
each other.

Add back the return values from the recursive call, and return the subset.

(don't worry too much about the recursion - it never goes down beyond one recursive call as the possibles subset is small in my case.)

Unfortunately, this algorithm doesn't do what I want in terms of being immune to a few highly conflicted additions to the input set! I think this is because of the way I bias the procedure to look at the most irrelevant widgets first, and these have a knock-on effect on which edges are left in the conflict graph. My aim in deleting them first was to remove their influence as soon as possible - it seems this was misguided.

Presumably this problem is analogous to a solved one - if so, I'd be grateful if someone could tell me which one, and (even better) briefly explain any jargon in their references!

If not (or my explanation is too vague) please let me know what parts of the CS literature to go and read.

[1] comparison is sort-of suitable for sorting (i.e., if w1 > w2 and w2 > w3, then we know for free that w1 > w3.) but evaluating the "fitness" of a subset as a whole wouldn't work as there's no sensible way to compare (w1, w2, w3) to (w1, w2, w4, w5).

tehwalrus
  • 2,589
  • 5
  • 26
  • 33

1 Answers1

3

This is an NP-complete problem known as the "weighted independent set" problem. I'm not sure what-all you mean by "deterministic", but one easy definition might be "if adding widgets to the input set results in a change to the output set, at least one of the added widgets is contained in the output set" -- that is, new widgets can change the output by looking attractive, but they can't screw with stuff going on in unrelated parts of the graph.

A simple approximation which satisfied that would be to greedily identify a maximum-score widget, add it to the output set, then remove all conflicting widgets. Repeat until no unselected widgets remain. I'm not sure if this gives a decent approximation guarantee, though.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Thanks for this info and the suggestion, I'll see if the greedy approach satisfies the condition I wanted (I think it's likely to.) The "deterministic" part was indeed the condition you mention, and I agree perhaps it was the wrong word. – tehwalrus Oct 11 '13 at 11:17
  • If the conflict is an equivalence relation, (i.e., with additional knowledge that if `w1` conflicts with `w2` and `w2` conflicts with `w3` implies `w1` conflicts with `w3`), the approximation would be correct. Otherwise, this problem is NP-complete as Ben said =) – justhalf Oct 11 '13 at 11:48
  • Alas, no - there is no such simple relation describing the conflicts. It's OK that it's an NP complete problem, that always a possibility for anything to do with combinatorics/graphs, and as usual with such problems it's OK to just have *a good answer*, (as long as it's the same one for inputs that vary in irrelevant ways!) I'm still working on implementing it by the way, it raised a bug in some other code, so I'll get back soon on that. – tehwalrus Oct 11 '13 at 13:13