I've been trying to program an AI for tic tac toe using a multilayer perceptron and backpropagation. My idea was to train the neural network to be an accurate evaluation function for board states, but the problem is even after analyzing thousands of games, the network does not output accurate evaluations.
I'm using 27 input neurons; each square on the 3x3 board is associated with three input neurons that receive values of 0 or 1 depending on whether the square has an x, o or is blank. These 27 input neurons send signals to 10 hidden neurons (I chose 10 arbitrarily, but I have tried with 5 and 15 as well).
For training, I've had the program generate a series of games by playing against itself using the current evaluation function to select what are deemed optimal moves for each side. After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence. I think this is what Gerald Tesauro did when programming TD-Gammon, but I might have misinterpreted the article. (note: I put the specific mechanism for updating weights at the bottom of this post).
I have tried various values for the learning rate, as well as varying numbers of hidden neurons, but nothing seems to work. Even after hours of "learning," there is no discernible improvement in strategy and the evaluation function is not anywhere close to accurate.
I realize that there are much easier ways to program tic tac toe, but I want to do it with a multilayer perceptron so that I may apply it to connect 4 later on. Is this even possible? I'm starting to think that there is no reliable evaluation function for a tic tac toe board with a reasonable amount of hidden neurons.
I assure you that I am not looking for some quick code to turn in for a homework assignment. I've been working unsuccessfully for a while now and would just like to know what I'm doing wrong. All advice is appreciated.
This is the specific mechanism I used for the NN:
Each of the 27 input neurons receives a 0 or 1, which passes through the differentiable sigmoid function 1/(1+e^(-x)). Each input neuron i sends this output (i.output), multiplied by some weight (i.weights[h]) to each hidden neuron h. The sum of these values is taken as input by the hidden neuron h (h.input), and this input passes through the sigmoid to form the output for each hidden neuron (h.output). I denote the lastInput to be the sum of (h.output * h.weight) across all of the hidden neurons. The outputted value of the board is then sigmoid(lastInput).
I denote the learning rate to be alpha, and err to be the correct output minus to actual output. Also I let dSigmoid(x) equal the derivative of the sigmoid at the point x.
The weight of each hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.output) and the weight of the signal from a given input neuron i to a given hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.weight*dSigmoid(h.input)*i.output).
I got these formulas from this lecture on backpropagation: http://www.youtube.com/watch?v=UnWL2w7Fuo8 .