I am trying to train a Neural Net on playing Tic Tac Toe via Reinforcement Learning with Keras
, Python
.
Currently the Net gets an Input of the current board:
array([0,1,0,-1,0,1,0,0,0])
1 = X
-1 = O
0 = an empty field
If the Net won a game it gets a reward for every action(Output) it did. [0,0,0,0,1,0,0,0,0]
If the Net loses I want to train it with a bad reward. [0,0,0,0,-1,0,0,0,0]
But currently I get a lot of 0.000e-000
accuracies.
Can I train a "bad reward" at all? Or if can't do it with -1
how should I do it instead?
Thanks in advance.