I am attempting to write a neural network to play chess, but I am running into problems with the output. I am using python-chess library, and built rewards in. The network has 4 outputs and three Fully Connected Layers. The 4 outputs should map between 0 and 7, and the first two and last two outputs map each map to a square's rank and file. To squash the output, I sigmoid each output and multiply by seven. The problem is that after a few learning epochs the learning hits a wall. The network outputs the same output regardless of initial seed, something like 3443 or 4333, and the outputs before sigmoid are all super close to zero. I figured it was the fact that the negative reward from failed moves and the sigmoid derivative was moving the output to 0, and thus outputted 3s and 4s. I need this network to learn through reinforcement learning, but this is stalling its learning badly.
Code:
def forward(self, x):
x = F.relu(self.action_head2(F.relu(self.action_head(F.relu(self.affine1(x))))))
action_scores = self.action_head3(x)
state_values = self.value_head(x)
print(action_scores)
return F.sigmoid(action_scores)*7, state_values