Minimax with alpha-beta pruning problems

Question

I'm making a C++ program for the game chopsticks.

It's a really simple game with only 625 total game states (and it's even lower if you account for symmetry and unreachable states). I have read up minimax and alpha-beta algorithms, mostly for tic tac toe, but the problem I was having was that in tic tac toe it's impossible to loop back to a previous state while that can easily happen in chopsticks. So when running the code it would end up with a stack overflow.

I fixed this by adding flags for previously visited states (I don't know if that's the right way to do it.) so that they can be avoided, but now the problem I have is that the output is not symmetric as expected.

For example in the start state of the game each player has one finger so it's all symmetric. The program tells me that the best move is to hit my right hand with my left but not the opposite.

My source code is -

#include <iostream>
#include <array>
#include <vector>
#include <limits>
std::array<int, 625> t; //Flags for visited states.
std::array<int, 625> f; //Flags for visited states.
int no = 0; //Unused. For debugging.
class gamestate
{
public:
    gamestate(int x, bool t) : turn(t) //Constructor.
    {
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 2; j++) {
                val[i][j] = x % 5;
                x /= 5;
            }
        init();
    }
    void print() //Unused. For debugging.
    {
        for (int i = 0; i < 2; i++) {
            for (int j = 0; j < 2; j++)
                std::cout << val[i][j] << "\t";
            std::cout << "\n";
        }
        std::cout << "\n";
    }
    std::array<int, 6> canmove = {{ 1, 1, 1, 1, 1, 1 }}; //List of available moves.
    bool isover() //Is the game over.
    {
        return ended;
    }
    bool won() //Who won the game.
    {
        return winner;
    }
    bool isturn() //Whose turn it is.
    {
        return turn;
    }
    std::vector<int> choosemoves() //Choose the best possible moves in the current state.
    {
        std::vector<int> bestmoves;
        if(ended)
            return bestmoves;
        std::array<int, 6> scores;
        int bestscore;
        if(turn)
            bestscore = std::numeric_limits<int>::min();
        else
            bestscore = std::numeric_limits<int>::max();
        scores.fill(bestscore);
        for (int i = 0; i < 6; i++)
            if (canmove[i]) {
                t.fill(0);
                f.fill(0);
                gamestate *play = new gamestate(this->playmove(i),!turn);
                scores[i] = minimax(play, 0, std::numeric_limits<int>::min(), std::numeric_limits<int>::max());
                std::cout<<i<<": "<<scores[i]<<std::endl;
                delete play;
                if (turn) if (scores[i] > bestscore) bestscore = scores[i];
                if (!turn) if (scores[i] < bestscore) bestscore = scores[i];
            }
        for (int i = 0; i < 6; i++)
            if (scores[i] == bestscore)
                bestmoves.push_back(i);
        return bestmoves;
    }
private:
    std::array<std::array<int, 2>, 2 > val; //The values of the fingers.
    bool turn; //Whose turn it is.
    bool ended = false; //Has the game ended.
    bool winner; //Who won the game.
    void init() //Check if the game has ended and find the available moves.
    {
        if (!(val[turn][0]) && !(val[turn][1])) {
            ended = true;
            winner = !turn;
            canmove.fill(0);
            return;
        }
        if (!(val[!turn][0]) && !(val[!turn][1])) {
            ended = true;
            winner = turn;
            canmove.fill(0);
            return;
        }
        if (!val[turn][0]) {
            canmove[0] = 0;
            canmove[1] = 0;
            canmove[2] = 0;
            if (val[turn][1] % 2)
                canmove[5] = 0;
        }
        if (!val[turn][1]) {
            if (val[turn][0] % 2)
                canmove[2] = 0;
            canmove[3] = 0;
            canmove[4] = 0;
            canmove[5] = 0;
        }
        if (!val[!turn][0]) {
            canmove[0] = 0;
            canmove[3] = 0;
        }
        if (!val[!turn][1]) {
            canmove[1] = 0;
            canmove[4] = 0;
        }
    }
    int playmove(int mov) //Play a move to get the next game state.
    {
        auto newval = val;
        switch (mov) {
        case 0:
            newval[!turn][0] = (newval[turn][0] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 1:
            newval[!turn][1] = (newval[turn][0] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 2:
            if (newval[turn][1]) {
                newval[turn][1] = (newval[turn][0] + newval[turn][1]);
                newval[turn][1] = (5 > newval[turn][1]) ? newval[turn][1] : 0;
            } else {
                newval[turn][0] /= 2;
                newval[turn][1] = newval[turn][0];
            }
            break;
        case 3:
            newval[!turn][0] = (newval[turn][1] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 4:
            newval[!turn][1] = (newval[turn][1] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 5:
            if (newval[turn][0]) {
                newval[turn][0] = (newval[turn][1] + newval[turn][0]);
                newval[turn][0] = (5 > newval[turn][0]) ? newval[turn][0] : 0;
            } else {
                newval[turn][1] /= 2;
                newval[turn][0] = newval[turn][1];
            }
            break;
        default:
            std::cout << "\nInvalid move!\n";
        }
        int ret = 0;
        for (int i = 1; i > -1; i--)
            for (int j = 1; j > -1; j--) {
                ret+=newval[i][j];
                ret*=5;
            }
        ret/=5;
        return ret;
    }
    static int minimax(gamestate *game, int depth, int alpha, int beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - depth;
            else
                return depth - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&t[game->playmove(i)]!=-1) {
                    int score;
                    if(!t[game->playmove(i)]){
                        t[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    }
                    else
                        score = t[game->playmove(i)];
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&f[game->playmove(i)]!=-1) {
                    int score;
                    if(!f[game->playmove(i)]){
                        f[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    }
                    else
                        score = f[game->playmove(i)];
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }
};
int main(void)
{
    gamestate test(243, true);
    auto movelist = test.choosemoves();
    for(auto i: movelist)
        std::cout<<i<<std::endl;
    return 0;
}

I'm passing the moves in a sort of base-5 to decimal system as each hand can have values from 0 to 4.

In the code I have input the state -

3    3

4    1

The output says I should hit my right hand (1) to the opponent's right (3) but it does not say I should hit it to my opponent's left (also 3)

I think the problem is because of the way I handled infinite looping.

What would be the right way to do it? Or if that is the right way, then how do I fix the problem?

Also please let me know how I can improve my code.

Thanks a lot.

Edit:

I have changed my minimax function as follows to ensure that infinite loops are scored above losing but I'm still not getting symmetry. I also made a function to add depth to the score

static float minimax(gamestate *game, int depth, float alpha, float beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - std::atan(depth) * 2000 / std::acos(-1);
            else
                return std::atan(depth) * 2000 / std::acos(-1) - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!t[game->playmove(i)]) {
                        t[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    } else if(t[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(t[game->playmove(i)], depth);
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!f[game->playmove(i)]) {
                        f[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    } else if(f[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(f[game->playmove(i)], depth);
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }

This is the function to add depth -

float adddepth(float score, int depth) //Add depth to pre-calculated score.
{
    int olddepth;
    float newscore;
    if(score > 0) {
        olddepth = std::tan((1000 - score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = 1000 - std::atan(depth) * 2000 / std::acos(-1);
    } else {
        olddepth = std::tan((1000 + score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = std::atan(depth) * 2000 / std::acos(-1) - 1000;
    }
    return newscore;
}

Is it possible that this asymmetry bug was already in your code before you added the flags? Or was your partial output (before the buffer overflowed) symmetric as expected? — Colin P. Hill, Jun 30 '15 at 14:06
For states which have a quick solution (only a few levels deep) I am getting symmetric results and I used to before I added the flags too. But for states which require deeper searching (such as the initial state) I didn't used to get any result from the program without the flags as it would always crash and now with them I'm getting asymmetric results. — idk, Jun 30 '15 at 14:41
So this addition of flags introduced a bug into your code. But the flags might not be a good solution to the problem they were meant to address. Are you developing under version control? If so, were I in your position I might just revert to before that change and develop a better fix for the overflow problem from that known-symmetric starting point. — Colin P. Hill, Jun 30 '15 at 19:56
No, no. I don't know if the bug is caused by the addition of flags because I get the same (symmetric) result for easier states as I used to before adding them. And for the difficult states earlier it would crash before I added the flags and now it gives asymmetric results. — idk, Jul 01 '15 at 05:06
It's only a personal project so I can revert back if needed, but I can't think of a better solution. — idk, Jul 01 '15 at 05:08

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Disclaimer: I don't know C++, ~~and I frankly haven't bothered to read the game rules~~. I have now read the rules, and still stand by what I said...but I still don't know C++. Still, I can present some general knowledge of the algorithm which should set you off in the right direction.

Asymmetry is not in itself a bad thing. If two moves are exactly equivalent, it should choose one of them and not stand helpless like Buridan's ass. You should, in fact, be sure that any agent you write has some method of choosing arbitrarily between policies which it cannot distinguish.

You should think more carefully about the utility scheme implied by refusing to visit previous states. Pursuing an infinite loop is a valid policy, even if your current representation of it will crash the program; maybe the bug is the overflow, not the policy that caused it. If given the choice between losing the game, and refusing to let the game end, which do you want your agent to prefer?

Playing ad infinitum

If you want your agent to avoid losing at all costs -- that is, you want it to prefer indefinite play over loss -- then I would suggest treating any repeated state as a terminal state and assigning it a value somewhere between winning and losing. After all, in a sense it is terminal -- this is the loop the game will enter forever and ever and ever, and the definite result of it is that there is no winner. However, remember that if you are using simple minimax (one utility function, not two), then this implies that your opponent also regards eternal play as a middling result.

It may sound ridiculous, but maybe playing unto infinity is actually a reasonable policy. Remember that minimax assumes the worst case -- a perfectly rational foe whose interests are the exact opposite of yours. But if, for example, you're writing an agent to play against a human, then the human will either err logically, or will eventually decide they would rather end the game by losing -- so your agent will benefit from patiently staying in this Nash equilibrium loop!

Alright, let's end the game already

If you want your agent to prefer that the game end eventually, then I would suggest implementing a living penalty -- a modifier added to your utility which decreases as a function of time (be it asymptotic or without bound). Implemented carefully, this can guarantee that, eventually, any end is preferable to another turn. With this solution as well, you need to be careful about considering what preferences this implies for your opponent.

A third way

Another common solution is to depth-limit your search and implement an evaluation function. This takes the game state as its input and just spits out a utility value which is its best guess at the end result. Is this provably optimal? No, not unless your evaluation function is just completing the minimax, but it means your algorithm will finish within a reasonable time. By burying this rough estimate deep enough in the tree, you wind up with a pretty reasonable model. However, this produces an incomplete policy, which means that it is more useful for a replanning agent than for a standard planning agent. Minimax replanning is the usual approach for complex games (it is, if I'm not mistaken, the basic algorithm followed by Deep Blue), but since this is a very simple game you probably don't need to take this approach.

A side note on abstraction

Note that all of these solutions are conceptualized as either numeric changes to or estimations of the utility function. This is, in general, preferable to arbitrarily throwing away possible policies. After all, that's what your utility function is for -- any time you make a policy decision on the basis of anything except the numeric value of your utility, you are breaking your abstraction and making your code less robust.

Thanks for your answer but if you had glanced over the code you would have seen that my function returns a "list" of the best possible moves. I intend to pass this list to another program which will then choose randomly from the best moves, which is why when there are symmetric moves I expect both of them to be in the list. As for the infinite play, I'm looking to make a perfect player so I'd definitely prefer that over losing. I modified my function to use arctan to make sure the score tends to zero at infinity and is zero for looping but I'm still not getting symmetry. — idk, Jun 30 '15 at 06:34
Not knowing C++, I would have seen no such thing! All these colons and angle brackets are foreign to me, and you have minimal explanatory comments -- I'm here because I know the algorithm, and I'm hoping some prodding in that more abstract realm can help you find your solution. — Colin P. Hill, Jun 30 '15 at 14:02
Your answer certainly helped to choose a better function (arctan) for evaluating the end game state, thanks a lot for that, but I still have no clue about how to deal with the infinite looping and more importantly how to get symmetry. At this point I'm considering just writing a function which checks if the game is symmetric and adds the relevant moves to the list but I would really like a better solution. — idk, Jun 30 '15 at 14:44
RE: infinite looping, is there any reason my suggestion under "Playing ad infinitum" won't work? You already have the flags -- you can just repurpose them so that they cause the state to be evaluated as a terminal state, instead of skipping the state entirely. — Colin P. Hill, Jun 30 '15 at 20:03
Of course that contradicts the other comment I just made on the question, but maybe reimplementing flags all over again with a different goal in mind might be a good idea, if you can't find the asymmetry bug. — Colin P. Hill, Jun 30 '15 at 20:05
Yeah I have done that with the arctan thing. What it does is that if the state is currently being calculated (i.e. it is in the loop) it gets a zero score and the score for losing is always negative and for winning is always positive. This way the infinite loop is preferred over losing. — idk, Jul 01 '15 at 05:16

Minimax with alpha-beta pruning problems

1 Answers1

Playing ad infinitum

Alright, let's end the game already

A third way

A side note on abstraction