Approximate Q learning in pacman java

Question

I have been working on Pacman AI using Approximate Q learning. I don't have a background in machine learning. At the moment, I don't have ghosts in the maze. The maze is huge, 31 * 36. The feature I currently have is the distance to a dot as in page six of this. Pacman exploration works fine.

Now the problem I'm facing is from a particular state, the Q table (which has North, East, South and West Q values for a state) will tell pacman to go left, then on the "left" state, the table will tell pacman to go right and it loops.

I have tried to curb that problem using this:

double maxValue = Double.MIN_VALUE;
    int desiredIndex = 0;
    for(int i = 0; i < 4; i++)
    {
        double val = Q[currState.getIndex()][i];
        //System.out.println(val);

        if(val > maxValue) {
            if(i == prevDir)
                continue;
            maxValue = val;
            desiredIndex = i;
        }
    }
    return desiredIndex;

with no luck.

Also, I don't understand how pacman is supposed to learn using this. How should the training be done? How will pacman know where to move once he's learned?

Help is much appreciated.

Where do you save var `prevDir` ? Also, if I understand correctly, you need to find highest value in your 1..4 loop to know where to go? I am not experienced with Machine Learning, but if you training by 'rewards' and 'punishments', you need to 'punish' your model for getting stuck (going back). — B-GangsteR, Dec 22 '17 at 03:26
`prevDir` is a global variable inside my Q-learning class. Yes, find the max Q value given a state and possible actions. That's a good idea. I hadn't thought of that. Thank you. — Levi, Dec 23 '17 at 04:28
Where do you assign some value to `prevDir`? For example, if your model go left, then right, and then again left and again, after your model had make a move to the left, the previous value will be `left`, but you need to block going to the right, because in this case your model stuck. So it would be not previous move but move before previous. And also you should now that even if you handle this situation, this will not save you from possible situation when your model if just going through 4 or more cells around. — B-GangsteR, Dec 24 '17 at 02:17

Approximate Q learning in pacman java

0 Answers0