Introduction
As it will be clear I am not a machine learning expert, but I work in a management position in data science and I am studying ML to understand the potential. Mostly as an exercise, I am training a neural network to predict resulting points in a Bejewel-like game given the initial board (let's say R rows, C columns, L colors). Following the deep-chess paper (https://arxiv.org/abs/1711.09667), I am not defining features but I am letting the NN find them. As an input I therefore use a R*C*L binary input neurons, plus 2 R*C additional input neurons for the 2 "special" gems (obtained by destroying more than 3 gems at a time). For a 8x8 board with the classic 5 colors, this amount to 448 input neurons. I use two hidden layers (100,20) and a sigmoid function between layers. I train this using a database I have built with a thousand games: starting board as input and points obtained after 5 moves as output.
Question
In general any suggestion is welcome of course, as I do not have professional experience in Machine Learning. My question is more theoretical however.
I was wondering how I could exploit the symmetry between the five colors. Indeed I know that shifting colors (i.e. switching the input neurons) nothing would change. An option I am thinking about now while I am writing this question (as usual!) would be to just multiply the training set by adding all the color permutations (5! = 120) of the input board to the set with the same output. A more refined and more conceptually appealing idea however would be to constraint the weights or the network structure somehow to reflect this theoretical symmetry, so that any learning would automatically update the network in a symmetric way. Is it feasible/advised? How to implement it?
Present Implementation
I use a very standard implementation of supervised learning in pybrain. The dataset DS has R*C*(L+2) binary input neurons and the output layer is normalized (subtracting mean and dividing by standard deviation). There are two hidden layers. Everything is fully connected.
The dataset DS is split 80-20 in TrainDS and TestDS.
nn=FeedForwardNetwork()
inLayer = LinearLayer(R*C*(L+2))
hidden1 = SigmoidLayer(100)
hidden2 = SigmoidLayer(20)
outLayer = SigmoidLayer(1)
nn.addInputModule(inLayer)
nn.addModule(hidden1)
nn.addModule(hidden2)
nn.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hidden1)
hidden1_to_hidden2 = FullConnection(hidden1, hidden2)
hidden2_to_out = FullConnection(hidden2,outLayer)
nn.addConnection(in_to_hidden)
nn.addConnection(hidden1_to_hidden2)
nn.addConnection(hidden2_to_out)
nn.sortModules()
TrainDS, TestDS = DS.splitWithProportion(0.8)
trainer = BackpropTrainer( nn, dataset=DS, momentum=0.1, verbose=True, weightdecay=0.01)
print sum(np.abs(nn.activateOnDataset(TrainDS) - TrainDS.data['target'][:len(TrainDS)]))/len(TrainDS)
print sum(np.abs(nn.activateOnDataset(TestDS) - TestDS.data['target'][:len(TestDS)]))/len(TestDS)
I get an average error on the train and test set of respectively 0.70 and 0.77.