Dividing players into "winners" and "losers": how to prove that greedy solution gives optimal result?

Question

I have a problem that states the following:

n players (where n is even) are to a play games against each other. Everyone will not necessarily play but a player can only play against someone else once. If two people do decide to play against each other, we have one loser and one winner. I then wish to partition my n players into two sets of size n/2: winners (W) and losers (L). I want all players in my winner set to have never lost against someone in my losers set.

This is impossible ex. for 4 players and games p1 won against p2, p2 won against p3, p3 won against p4 and p4 won against p1 then there is no way to partition the players into W and L. I do the next best thing, which is I wish to minimise my error: the number of pairs of players where a player in W has lost to a player in L (not playing against each other is not a loss).

I (think) I found a greedy solution to this problem. I simply sort the players by their number of losses and place the people with the least loses in my W set and fill in the rest to L. How do I go about proving that my greedy approach is in fact optimal? I have done several random tests and I can show that my approach will give a feasible solution but I don't know how to show that this does in fact minimise my error.

Interesting question. If unsure, choose a value for n that is as large as is tractable (N = 12, perhaps) and search exhaustively for a counterexample. If you found a counterexample (and I suspect that you might), that would save you some time and trouble, wouldn't it? And if you found none, that would motivate your further search for the proof. — thb, Feb 09 '19 at 01:52

Matt Timmermans · Answer 1 · 2019-02-09T05:39:18.317

3

Your greedy algorithm is not optimal. It fails for:

 W      L
===    ===
 A  vs  x
 B  vs  y
 C  vs  z
 B  vs  A
 C  vs  A
 x  vs  y

The optimal partition is W=(A,B,C), L=(x,y,z), but you will put A in the loser set, because he has 2 losses.

You say you did some randomized tests. How did you validate that your greedy algorithm produced the correct results for these tests?

edited Feb 09 '19 at 05:39

answered Feb 09 '19 at 05:21

Matt Timmermans

53,709
3
46
87

Ah. Can you point me in the right direction for the cprrect approach? – Ayumu Kasugano Feb 09 '19 at 05:33
Actually, for your example B & C have the least loses so they most go into W. The next place is tied with X, Y and Z. If you pick Y then a partition W=(B,C,Y) and L=(A,X,Z) works just as well. – Ayumu Kasugano Feb 09 '19 at 05:36
Edited so you can't pick `y`. But that's not really necessary, since "optimal when it's lucky" isn't the same as "optimal" – Matt Timmermans Feb 09 '19 at 05:41
Your edit still works with my approach. The least wins are either W = (B, C, X) or (B,C,Z). If you choose (B,C,Z), we still get an ideal partition. I agree this looks worrying, but I tried to argue to myself I could break ties arbitrarily. – Ayumu Kasugano Feb 09 '19 at 05:46
2

You said sort by losses, not wins. Yes, I think you're a victim of wishful thinking. I think your problem is hard and I don't have a really good way to do it, so what I hope to help you with is understanding why your tests failed to detect that your algorithm doesn't work. It's likely that it has to do with this kind of wishful thinking. – Matt Timmermans Feb 09 '19 at 13:32
Matt is right, but it's worth mentioning that *optimal* has a strict, formal definition which many algorithms don't satisfy; but there are lots of non-optimal but exceedingly useful heuristics. This is likely among them. If you have six fast heuristics that almost always will contain the right answer and a slow but correct algorithm to give the optimal answer, and you don't really *need* optimality, it would be silly to insist on formal correctness. Be careful what you wish for. Probably a more useful question for you is, "how big can my error be?" – Patrick87 Feb 13 '19 at 18:31

score 0 · Answer 2 · answered Feb 13 '19 at 18:19

Consider the following outcomes:

Winner    Loser
Adam      John
Bob       John
John      Charles
John      David
John      Ernest
John      Frank
John      George

We tally up the losses and sort in ascending order:

Player    Losses
Adam      0
Bob       0
Charles   1
David     1
Ernest    1
Frank     1
George    1
John      2

Your algorithm divides the players as follows:

Winners    Losers
Adam       Ernest
Bob        Frank
Charles    George
David      John

The errors are (Charles, John) and (David, John); there are two errors. Consider instead the following division:

Winners    Losers
Adam       David
Bob        Ernest
Charles    Frank
John       George

There is no error in this division: there is no winner who lost to a loser. This is a better division, with less error; so your algorithm, as stated, is not optimal.

The fundamental problem with your algorithm is that it considers only the number of losses; prolific players can appear worse to this algorithm than they really are simply because they have more losses than others, despite possibly having many more wins.

It might be interesting to propose the following modification to your algorithm: first sort ascending by number of losses, calling the top half winners and bottom half losers; then, sort descending by wins, calling the top half winners and bottom half losers. Then, choose whichever of these partitions has the least error. I don't know if that is optimal or not, but it easily dispels the kinds of counterexamples Matt and I have provided. If that's not optimal you could maybe even throw "best batting average" on the pile. This is probably NP-Hard, though. — Patrick87, Feb 13 '19 at 18:27

Dividing players into "winners" and "losers": how to prove that greedy solution gives optimal result?

2 Answers2