3

I'm having an hard time setting up a neural network to classify Tic-Tac-Toe board states (final or intermediate) as "X wins", "O wins" or "Tie".

I will describe my current solution and results. Any advice is appreciated.

* DATA SET * Dataset = 958 possible end-games + 958 random-games = 1916 board states (random-games might be incomplete but are all legal. i.e. do not have both players winning simultaneously).

Training set = 1600 random sample of Dataset Test set = remaining 316 cases

In my current pseudo-random development scenario the dataset has the following characteristics. Training set: - 527 wins for "X" - 264 wins for "O" - 809 ties Test set: - 104 wins for "X" - 56 wins for "O" - 156 ties

* Modulation * Input Layer: 18 input neurons where each one corresponds to a board position and player. Therefore, the board (B=blank): x x o o x B B o X is encoded as: 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0

Output Layer: 3 output neurons which correspond to each outcome (X wins, O wins, Tie).

* Architecture *

Based on: http://www.cs.toronto.edu/~hinton/csc321/matlab/assignment2.tar.gz

1 Single Hidden Layer Hidden Layer activation function: Logistic Output Layer activation function: Softmax Error function: Cross-Entropy

* Results *

No combination of parameters seems to achieve 100% correct classification rate. Some examples:

NHidden     LRate   InitW   MaxEpoch Epochs FMom    Errors  TestErrors
8           0,0025  0,01    10000   4500    0,8     0       7
16          0,0025  0,01    10000   2800    0,8     0       5
16          0,0025  0,1     5000    1000    0,8     0       4
16          0,0025  0,5     5000    5000    0,8     3       5
16          0,0025  0,25    5000    1000    0,8     0       5
16          0,005   0,25    5000    1000    0,9     10      5
16          0,005   0,25    5000    5000    0,8     15      5
16          0,0025  0,25    5000    1000    0,8     0       5
32          0,0025  0,25    5000    1500    0,8     0       5
32          0,0025  0,5     5000    600     0,9     0       5
8           0,0025  0,25    5000    3500    0,8     0       5

Important - If you think I could improve any of the following: - The dataset characteristics (source and quantities of training and test cases) aren't the best. - An alternative problem modulation is more suitable (encoding of input/output neurons) - Better network architecture (Number of Hidden Layers, activation/error functions, etc.).

Assuming that my current options in this regard, even if not optimal, should not prevent the system from having a 100% correct classification rate, I would like to focus on other possible issues.

In other words, considering the simplicity of the game, this dataset/modulation/architecture should do it, therefore, what am I doing wrong regarding the parameters?

I do not have much experience with ANN and my main question is the following: Using 16 Hidden Neurons, the ANN could learn to associate each Hidden Unit with "a certain player winning in a certain way" (3 different rows + 3 different columns + 2 diagonals) * 2 players

In this setting, an "optimal" set of weights is pretty straightforward: Each hidden unit has "greater" connection weights from 3 of the input units (corresponding to a row, columns or diagonal of a player) and a "greater" connection weight to one of the output units (corresponding to "a win" of that player).

No matter what I do, I cannot decrease the number of test errors, as the above table shows.

Any advice is appreciated.

schaul
  • 1,021
  • 9
  • 21
user1528976
  • 191
  • 6

1 Answers1

2

You are doing everything right, but you're simply trying to tackle a difficult problem here, namely to generalize from some examples of tic-tac-toe configurations to all others.

Unfortunately, the simple neural network you use does not perceive the spatial structure of the input (neighborhood) nor can it exploit the symmetries. So in order to get perfect test error, you can either:

  • increase the size of the dataset to include most (or all) possible configurations -- which the network will then be able to simply memorize, as indicated by the zero training error in most of your setups;

  • choose a different problem, where there is more structure to generalize from;

  • use a network architecture that can capture symmetry (e.g. through weight-sharing) and/or spatial relations of the inputs (e.g. different features). Convolutional networks are just one example of this.

schaul
  • 1,021
  • 9
  • 21