I have a big set of comparable things (call them widgets), and I need to choose a subset of them. The widgets can be conflicted with other widgets, and they can be "compared"[1] to see which is "better" to resolve the conflicts. I want to choose the best subset that contains as many non-conflicting widgets as possible.
I want my choice to be robust and deterministic - if I add some irrelevant widgets that conflict with lots of others, I don't want these to change the result compared to if they hadn't been included - this is what I mean by finding the "deterministic best" subset, a unique one, or at worst the same one every time out a set of subsets that are equivalent.
My implementation looks something like this:
Build a graph of widgets, where edges are conflicts between widgets.
While there are edges in the conflict graph:
Make a list of the edges in the conflict graph, sorted so that the most
conflicted widgets are at the front of the list.
Pick the first edge in the list, compare the two, delete the loser. (note:
this step includes tidying up the graph, which is why we need to refresh
the list of conflict edges)
Make a list, possibles, of deleted widgets that don't conflict with any in the
final set.
Call this procedure recursively on possibles in case they're conflicted with
each other.
Add back the return values from the recursive call, and return the subset.
(don't worry too much about the recursion - it never goes down beyond one recursive call as the possibles subset is small in my case.)
Unfortunately, this algorithm doesn't do what I want in terms of being immune to a few highly conflicted additions to the input set! I think this is because of the way I bias the procedure to look at the most irrelevant widgets first, and these have a knock-on effect on which edges are left in the conflict graph. My aim in deleting them first was to remove their influence as soon as possible - it seems this was misguided.
Presumably this problem is analogous to a solved one - if so, I'd be grateful if someone could tell me which one, and (even better) briefly explain any jargon in their references!
If not (or my explanation is too vague) please let me know what parts of the CS literature to go and read.
[1] comparison is sort-of suitable for sorting (i.e., if w1 > w2 and w2 > w3, then we know for free that w1 > w3.) but evaluating the "fitness" of a subset as a whole wouldn't work as there's no sensible way to compare (w1, w2, w3) to (w1, w2, w4, w5).