3

I've been trying to program an AI for tic tac toe using a multilayer perceptron and backpropagation. My idea was to train the neural network to be an accurate evaluation function for board states, but the problem is even after analyzing thousands of games, the network does not output accurate evaluations.

I'm using 27 input neurons; each square on the 3x3 board is associated with three input neurons that receive values of 0 or 1 depending on whether the square has an x, o or is blank. These 27 input neurons send signals to 10 hidden neurons (I chose 10 arbitrarily, but I have tried with 5 and 15 as well).

For training, I've had the program generate a series of games by playing against itself using the current evaluation function to select what are deemed optimal moves for each side. After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence. I think this is what Gerald Tesauro did when programming TD-Gammon, but I might have misinterpreted the article. (note: I put the specific mechanism for updating weights at the bottom of this post).

I have tried various values for the learning rate, as well as varying numbers of hidden neurons, but nothing seems to work. Even after hours of "learning," there is no discernible improvement in strategy and the evaluation function is not anywhere close to accurate.

I realize that there are much easier ways to program tic tac toe, but I want to do it with a multilayer perceptron so that I may apply it to connect 4 later on. Is this even possible? I'm starting to think that there is no reliable evaluation function for a tic tac toe board with a reasonable amount of hidden neurons.

I assure you that I am not looking for some quick code to turn in for a homework assignment. I've been working unsuccessfully for a while now and would just like to know what I'm doing wrong. All advice is appreciated.


This is the specific mechanism I used for the NN:

Each of the 27 input neurons receives a 0 or 1, which passes through the differentiable sigmoid function 1/(1+e^(-x)). Each input neuron i sends this output (i.output), multiplied by some weight (i.weights[h]) to each hidden neuron h. The sum of these values is taken as input by the hidden neuron h (h.input), and this input passes through the sigmoid to form the output for each hidden neuron (h.output). I denote the lastInput to be the sum of (h.output * h.weight) across all of the hidden neurons. The outputted value of the board is then sigmoid(lastInput).

I denote the learning rate to be alpha, and err to be the correct output minus to actual output. Also I let dSigmoid(x) equal the derivative of the sigmoid at the point x.

The weight of each hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.output) and the weight of the signal from a given input neuron i to a given hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.weight*dSigmoid(h.input)*i.output).

I got these formulas from this lecture on backpropagation: http://www.youtube.com/watch?v=UnWL2w7Fuo8 .

Site
  • 245
  • 3
  • 15

4 Answers4

5

Tic tac toe has 3^9 = 19683 states (actually, some of them aren't legal, but the order of magnitude is right). The ouput function isn't smooth, so I think the best a backpropagation network can do is "rote learning" a look-up table for all these states.

With that in mind, 10 hidden neurons seems very small, and there's no way you can train 20k different look-up-table entries by teaching a few thousand games. For that, the network would have to "extrapolate" from states it has been taught to states it has never seen, and I don't see how it could do that.

Niki
  • 15,662
  • 5
  • 48
  • 74
  • I realize that a look up table is definitely feasible for tic-tac-toe, and successfully programmed one with all the board states. As I mentioned in my original post, I'm looking to program tic tac toe using a neural network so that I can apply the same ideas to a Connect 4 algorithm, where a look up table would not work. – Site May 22 '12 at 05:00
  • 4
    @Site: But that's just the point, your ANN *is* just a lookup table. How is it supposed to be able to judge positions it has never seen? This is only possible if the function from input to output is smooth, i.e. the network can interpolate between trained patterns. You could try to give it different input features, where interpolation might work. For example, number of rows/columns/diagonals where there is only one/two pieces of the player or the enemy. – Niki May 22 '12 at 07:07
  • To add to Niki answer: I tried the same approach as you (27 input neurons, 1 output neuron) and I also was not able to teach the network. – Krzysztof Kazmierczyk Jun 11 '18 at 19:57
  • @Niki: An NN can "extrapolate" from states in Tik Tak Toe, because some optimal actions can be determined, when only looking at parts of the game. – pianissimo Jun 23 '19 at 20:10
  • @pianissimo: You're right, that the number 3^9 isn't accurate - you wouldn't have to store *every* position and some of them aren't even legal. It's just an order of magnitude. Having 10 hidden nodes simply can't work, even if the network is smart enough to "extrapolate" some positions by only looking at parts of the game. If you used e.g. a CNN, then the net could probably generalize from one row to the other or from one column to the other. But a fully-connected network has to "memorize" more or less everything for every position. – Niki Jun 24 '19 at 05:58
0

You might want to consider more than one hidden layer, as well as upping the size of the hidden layer. For comparison purposes, Fogel and Chellapilla used two layers of 40 and 10 neurons to program up a checkers player, so if you need something more than that, something is probably going terribly wrong.

You might also want to use bias inputs, if you're not already.

Your basic methodology seems sound, although I'm not 100% sure what you mean by this:

After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence.

I think you mean that you're using some known-good method (like a minimax game tree) to determine the "correct" answers for the training examples. Can you explain that a little bit? Or, if I'm correct, it seems like there's a subtlety to deal with, in terms of symmetric boards, which might have more than one equally good best response. If you're only treating one of those as correct, that might lead to problems. (Or it might not, I'm not sure.)

Novak
  • 4,687
  • 2
  • 26
  • 64
  • 1
    Thanks for the reply. To generate the training examples I actually wasn't using a known good method; I just had the program play against itself using its current evaluation function. Of course original game sequences would be random, but the idea is that as the neural net converges to a good evaluation function the training sets become more optimal. I read online that this is the approach that Tesauro used when programming TD-Gammon. – Site May 22 '12 at 04:39
0

Just to throw in another thought have your thought about using reinforcement learning for this task? It would be much easier to implement and much more effective. For example you could use Q learning which is often used for games.

Ilovescience
  • 133
  • 2
  • 8
0

Here you can find an implementation for training a Neural Network in Tik Tak Toe (variable board-size) using self-play. The gradient is back-propagated through the whole game employing a simple gradient-copy trick.

pianissimo
  • 265
  • 3
  • 8