0

I am attempting to write a neural network to play chess, but I am running into problems with the output. I am using python-chess library, and built rewards in. The network has 4 outputs and three Fully Connected Layers. The 4 outputs should map between 0 and 7, and the first two and last two outputs map each map to a square's rank and file. To squash the output, I sigmoid each output and multiply by seven. The problem is that after a few learning epochs the learning hits a wall. The network outputs the same output regardless of initial seed, something like 3443 or 4333, and the outputs before sigmoid are all super close to zero. I figured it was the fact that the negative reward from failed moves and the sigmoid derivative was moving the output to 0, and thus outputted 3s and 4s. I need this network to learn through reinforcement learning, but this is stalling its learning badly.

Code:

def forward(self, x):
    x = F.relu(self.action_head2(F.relu(self.action_head(F.relu(self.affine1(x))))))
    action_scores = self.action_head3(x)
    state_values = self.value_head(x)
    print(action_scores)
    return F.sigmoid(action_scores)*7, state_values
benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
Superman
  • 196
  • 1
  • 2
  • 8
  • When the network is supposed to output discrete values it's often better to use a softmax activation function over vectors where each element represents one of the possible values, then select the discrete output value with argmax, instead of quantizing sigmoid outputs like you're doing. Have you tried using [one-hot encoding](https://jacobkimmel.github.io/pytorch_onehot/) to represent each of your 4 outputs? You would have 4 vectors each of length 7, where exactly one element is `1` and the other six are `0`. – 0xsx Jul 14 '18 at 05:23
  • Using softmax is a good ideas, I could reshape a 28 length vector to a 4x7, and argmax each row after the softmax. One hot encoding would be tricky because I don't know which move I want, so I couldn't tell it which spot to be 1 and which spots to leave as 0. I will definitely try the softmax, Thanks! – Superman Jul 14 '18 at 14:59
  • What RL algorithm are you using? – desert_ranger Jul 19 '22 at 14:55

0 Answers0