How to exploit logical symmetry of input neurons to train a supervised neural network?

Question

Introduction

As it will be clear I am not a machine learning expert, but I work in a management position in data science and I am studying ML to understand the potential. Mostly as an exercise, I am training a neural network to predict resulting points in a Bejewel-like game given the initial board (let's say R rows, C columns, L colors). Following the deep-chess paper (https://arxiv.org/abs/1711.09667), I am not defining features but I am letting the NN find them. As an input I therefore use a R*C*L binary input neurons, plus 2 R*C additional input neurons for the 2 "special" gems (obtained by destroying more than 3 gems at a time). For a 8x8 board with the classic 5 colors, this amount to 448 input neurons. I use two hidden layers (100,20) and a sigmoid function between layers. I train this using a database I have built with a thousand games: starting board as input and points obtained after 5 moves as output.

Question

In general any suggestion is welcome of course, as I do not have professional experience in Machine Learning. My question is more theoretical however.

I was wondering how I could exploit the symmetry between the five colors. Indeed I know that shifting colors (i.e. switching the input neurons) nothing would change. An option I am thinking about now while I am writing this question (as usual!) would be to just multiply the training set by adding all the color permutations (5! = 120) of the input board to the set with the same output. A more refined and more conceptually appealing idea however would be to constraint the weights or the network structure somehow to reflect this theoretical symmetry, so that any learning would automatically update the network in a symmetric way. Is it feasible/advised? How to implement it?

Present Implementation

I use a very standard implementation of supervised learning in pybrain. The dataset DS has R*C*(L+2) binary input neurons and the output layer is normalized (subtracting mean and dividing by standard deviation). There are two hidden layers. Everything is fully connected.

The dataset DS is split 80-20 in TrainDS and TestDS.

nn=FeedForwardNetwork()

inLayer = LinearLayer(R*C*(L+2))
hidden1 = SigmoidLayer(100)
hidden2 = SigmoidLayer(20)
outLayer = SigmoidLayer(1)

nn.addInputModule(inLayer)
nn.addModule(hidden1)
nn.addModule(hidden2)
nn.addOutputModule(outLayer)

in_to_hidden = FullConnection(inLayer, hidden1)
hidden1_to_hidden2 = FullConnection(hidden1, hidden2)
hidden2_to_out = FullConnection(hidden2,outLayer)
nn.addConnection(in_to_hidden)
nn.addConnection(hidden1_to_hidden2)
nn.addConnection(hidden2_to_out)
nn.sortModules()


TrainDS, TestDS = DS.splitWithProportion(0.8)

trainer = BackpropTrainer( nn, dataset=DS, momentum=0.1, verbose=True, weightdecay=0.01)
print sum(np.abs(nn.activateOnDataset(TrainDS) - TrainDS.data['target'][:len(TrainDS)]))/len(TrainDS)
print sum(np.abs(nn.activateOnDataset(TestDS) - TestDS.data['target'][:len(TestDS)]))/len(TestDS)

I get an average error on the train and test set of respectively 0.70 and 0.77.

I think reinforcement learning would be best suited for this type of task. — Zabir Al Nazi, Sep 22 '19 at 17:06
If the order of the inputs doesn't matter, you can try augmenting the data by shuffling the inputs. Maybe it would be smart to compress the grid. Instead of all the inputs, you can formulate an aggregate function which would be different for each input set but won't depend on the order. — Zabir Al Nazi, Sep 22 '19 at 17:14
This is something I want to know too. It seems frequently relevant to image processing where there could be rotational and reflective symmetries. In some applications there could even be transnational symmetries. It seems like a waste to treat some neurons as different that aren't different at all. I would think it would increase training time to get every pixel to understand that it has the same relationship as all the others and with respect to it's neighbors by sheer exhaustion. — Jason Mitchell, Sep 27 '19 at 00:41
So the problem I'm seeing in common between two problems is that some neurons should be treated as equivalent, but only if some other related neurons also assume some analogous change in role. One thing I know about your problem in particular is that you could calculate which color is the most populous and assign it to one color, and then so on. I would highly suggest that. — Jason Mitchell, Sep 27 '19 at 00:44
@JasonMitchell I really like your suggestion, very good idea! — bobon123, Oct 12 '19 at 07:28

How to exploit logical symmetry of input neurons to train a supervised neural network?

Introduction

Question

Present Implementation

0 Answers0