1

I'm trying to implement an abstract minimax algorithm with alpha beta pruning. The minimax part works great, but as soon as I add alpha beta pruning, the IA starts to behave really silly, even skipping obvious moves. I'm not sure what's going on.

This is how my recursive function looks like:

- (id<MMGameMove>)getBestMove:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    id<MMGameMove> bestMove = nil;
    NSArray *allMoves = [game allMoves];

    for (id<MMGameMove> move in allMoves)
    {
        //Take the move and evaluate the game's score
        id<MMGame> gameBoard = [game clone];
        move.player = player;
        [gameBoard saveMove:move];
        self.count++;

        if (depth == 0 || gameBoard.isOver)
        {
            move.rank = [gameBoard scoreForPlayer:self.playerId depth:depth];
        }
        else
        {
            MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;
            move.rank = [self getBestMove:gameBoard player:opponent depth:depth-1 alpha:alpha beta:beta].rank;
        }

        //If the new move is better than our previous move, take it
        BOOL minMove = (player == self.opponentId && move.rank <= beta);
        BOOL maxMove = (player == self.playerId && move.rank >= alpha);

        if (minMove || maxMove)
        {
            BOOL shouldPrune = NO;
            if (minMove)
            {
                beta = move.rank;
                if (alpha >= beta) {
                    shouldPrune = YES;
                }
            }
            else if (maxMove)
            {
                alpha = move.rank;
                if (alpha <= beta) {
                    shouldPrune = YES;
                }
            }

            bestMove = move;

            if (shouldPrune && depth < self.maxDepth) {
                break;
            }
        }
    }

    return bestMove;
}

My initial call looks like this:

[self getBestMove:game player:self.playerId depth:self.maxDepth alpha:INT_MIN beta:INT_MAX];

As far as I understand, for the same game state, alpha-beta pruning should give me the very same move as minimax does without it, but with this implementation this is clearly not the case.

EDIT 1

After the suggested modification there were another error and it was that I was pruning root nodes. I edited the code to reflect the correct answer. After doing this and running both minimax with and without alpha-beta pruning I can see now that both yield the same result and also I was able to check the better performance you got out of alpha beta addition.

EDIT 2

The code posted above is not working as expected actually. I followed xXliolauXx suggestion, but I still can't get it working. I get the right values at depth = 0 or game over, but it seems they are not passed back recursively to the corresponding root move. For example, I'm able to see that my heuristic returns -3 for a child of the first root move and 0 for the rest of its childs. So I'm expecting that the first root move reports -3 instead of 0, since that's the worst case scenario the computer might find if it does that move.

Here's my new code:

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            alpha = MAX(alpha, score);

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            beta = MIN(beta, score);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return (player == self.playerId) ? alpha : beta;
}

Note that I prune when beta < alpha while maximizing. Otherwise, it will always prune after scanning the first root move.

This is how I initiate the recursion:

[self alphabeta:game player:self.playerId depth:self.maxDepth alpha:-INFINITY beta:INFINITY];

EDIT 3

I think I got it. Rather than return alpha or beta, I return the best (or worst) score. I need to cleanup my code to make it more readable, but here's how it looks like right now:

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent;
    NSInteger bestScore;

    if (player == self.playerId)
    {
        opponent = self.opponentId;
        bestScore = -INFINITY;
    }
    else
    {
        opponent = self.playerId;
        bestScore = INFINITY;
    }

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            bestScore = MAX(bestScore, score);
            alpha = MAX(alpha, bestScore);

            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            bestScore = MIN(bestScore, score);
            beta = MIN(beta, bestScore);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return bestScore;
}
mdonati
  • 1,049
  • 11
  • 24
  • Well, at least we know that the mistake must be somewhere in the minimax part. I'll have a look at it... – xXliolauXx Jul 20 '15 at 09:53
  • I've been studying this algorithm for two weeks now and I swear I can't get a grasp of it. I really don't understand why it is failing, so thanks for taking a look at it again. If I follow your suggestion about pruning when maximizing && alpha <= beta, it will always analyze just one root node because I set beta = INFINITY so alpha will always be less than that after analyzing the first root node all the way down. – mdonati Jul 20 '15 at 14:15
  • Forget about that comment, I wrote it during lunch and wasn't really thinking -.- I'll compare it to my own algorithm today evening. – xXliolauXx Jul 20 '15 at 14:46
  • This MUST be it! Try instead of alpha>beta score<=alpha when maximising and score>=beta when minimising for the cutoff decision. – xXliolauXx Jul 20 '15 at 17:43
  • Really appreciate you took a look at it. It still has errors. Sometimes it overlooks and sometimes it reports the wrong number. I'm guessing this probably is related to something else in my code. I'm almost dreaming at night about all the different minimax alpha-beta pruning implementations I've been checking and implementing on my own and for all of them there seems to be an error. The weird thing is that my code looks pretty much the same as other implementations that claim to work well. Thanks for giving it a try. Really appreciate the fact you took some time to look at it. – mdonati Jul 21 '15 at 18:54
  • When comparing to others, you really have to watch out wether it's minimax or negamax. Btw, what language is that? It is terribly messy :/ – xXliolauXx Jul 21 '15 at 18:58
  • And btw. Pls update your code. Debug it carefully, maybe with depth = 2 and compare the alpha beta results to the minimax results you got previously... – xXliolauXx Jul 22 '15 at 04:24
  • Thanks for taking a look again. I updated my code after I debugged this further. Found out what the issue is. See my last edit. The language is objective-c btw. It's funny though because you may hate it when you start programming on it, but you start loving it after a few months of writing it all day. – mdonati Jul 22 '15 at 07:26
  • Cool it works now, but it should actually work with returning alpha respectively beta. Also be aware that you are loosing quite a bit of AB efficiency by setting your bestScore to Infinity/-Infinity, but that can be fixed later. – xXliolauXx Jul 22 '15 at 09:52
  • I also don't like the usage of a "bestScore" variable: alpha respectively beta should do that work. I'll get myself an obj c compiler and look at it during runtime... – xXliolauXx Jul 22 '15 at 09:58

1 Answers1

-1

The mistake seems to be in your pruning-possible-part (it's an implementation of a negamax-alpha beta, while you are using minimax-alphabeta).

To fix it, simply add an if, wether you are currently maximising or minimising (as you did at the point where you change alpha or beta).

When you are minimising, you prune whenever alpha >= beta, when maximising vice-versa.

After that, the code should work fine (if there aren't other mistakes ;) ).

xXliolauXx
  • 1,273
  • 1
  • 11
  • 25
  • Thanks for trying to help. I edited the code to include your suggestion. It seems to be doing the same thing. I guess there's something else then. I'm trying to figure out why though because other than the issue you mentioned the implementation looks good to me according to what I've been reading and also according to other implementations of the algorithm. – mdonati Jul 14 '15 at 22:13