2

I have a specific game, which is not literally zero-sum, because points are awarded by the game during a match, but close to it, in the sense that the number of total points have a clear upper limit, so the more points you score, the less points are available for your opponents. The game is played by 5 players, with no teams whatsoever.

I'm making a genetic algorithm play rounds against itself with pseudo-random "mutations" between generations.

But after a couple hundred generations, a pattern always emerges. The algorithm ends up strongly favoring a specific player (example: the player who plays first). Since the mutations giving the "best results" serve as a base for the next generation, this seems to move towards some version of "If you are the first player, play this way (the way being a very specific yet pretty random technique that gives bad, or at best average, results), and if not, then play in this specific way that indirectly but strongly favors the first player".

Then, for the next generations, the player whose turn is strongly favored starts mutating totally randomly because it wins every round no matter what it does, as long as the part of the algorithm that favors that player is still intact.

I'm looking for a way to prevent this specific evolution route, but I can't figure out how to possibly "reward" victory by your own strategy more than victory because you were helped a lot.

Kaito Kid
  • 983
  • 4
  • 15
  • 34
  • The same genetic algo. is used for all players (with perhaps separate mutations per player)? I'm more familiar with evolution than genetic algorithms, but this seems to be an inherent problem. You've essentially evolved the equivalent of a bee-hive: worker bees don't reproduce, but instead work to protect their queen (who is their genetic sister). The optimization target is not "I win" but "This algo. produces a winner." Instead of having all players use the same algo. (modulo mutations), I feel that there should be a number of algorithms being evolved simultaneously (or one at a time). – jwimberley May 15 '17 at 17:16
  • One idea would be to (during training) penalise players for points their opponents score. Or possibly award bonus points for scoring more points than any other player or weight points non-linearly. A large part of problems like these is experimentation, so I suspect this question's a bit too broad for [so]. – Bernhard Barker May 16 '17 at 01:17

2 Answers2

1

I think this happens because only the winner of the round robbin tournament gets promoted and mutated on each generation. At first players more or less win randomly, but then a strategy comes up that favors a position. Now I guess that slightly diverting from that strategy (pseudo-random mutations) makes you only lose the games where you are in the favoured position but not win any of the others, so you will never divert from that strategy, something like a local Nash equilibrium.

You could try to keep more than one individual per generation and generate mutations from them. But I doubt this will help and at best delay the effect. Because soon the code of the best individual will spread to all. This seems to be the root cause of the problem.

Therefore my suggestion would be to have t tribes where each tribe has x/t individuals. Now instead of playing a round robbin tournament each individual plays only against the individuals of other tribes. Then you keep the best individual per tribe, mutate and proceed with the next generation. So that the tribes never mix genes.

maraca
  • 8,468
  • 3
  • 23
  • 45
0

To me, it seems like there is an easy fix: play multiple games each evaluation.

Instead of each generation only testing one game, strongly favouring the starting player, play 5 games and distribute who starts first equally ( so every player starts first at least once ).


I suppose your population is larger than 5, right? So how are you testing the genomes against each other? You should definitely not have let them play only one game, because maybe you have paired up a medium player against 4 easy players, making it seem like the medium player is better.
Thomas Wagenaar
  • 6,489
  • 5
  • 30
  • 73
  • Maybe I was unclear. I am playing multiple games on every generation. I am creating a population of x, and playing one game for every possible combination of players (all randomly mutated differently), and keeping the one who won the most games, and I am of course randomizing the start order. They aren't always favoring the first player, but they always end up favoring a certain starting number, and as long as that part of the code isn't mutated out, they all evolve pretty randomly for action when they are in that "favored" position, since that will win anyway – Kaito Kid May 15 '17 at 17:47