I have been working on Pacman AI using Approximate Q learning. I don't have a background in machine learning. At the moment, I don't have ghosts in the maze. The maze is huge, 31 * 36. The feature I currently have is the distance to a dot as in page six of this. Pacman exploration works fine.
Now the problem I'm facing is from a particular state, the Q table (which has North, East, South and West Q values for a state) will tell pacman to go left, then on the "left" state, the table will tell pacman to go right and it loops.
I have tried to curb that problem using this:
double maxValue = Double.MIN_VALUE;
int desiredIndex = 0;
for(int i = 0; i < 4; i++)
{
double val = Q[currState.getIndex()][i];
//System.out.println(val);
if(val > maxValue) {
if(i == prevDir)
continue;
maxValue = val;
desiredIndex = i;
}
}
return desiredIndex;
with no luck.
Also, I don't understand how pacman is supposed to learn using this. How should the training be done? How will pacman know where to move once he's learned?
Help is much appreciated.