I am toying around with Machine learning. Especially Q-Learning where you have a state and actions and give rewards depending on how well the network did.
Now for starters I set myself a simple goal: Train a network so it emits valid moves for tic-tac-toe (vs a random opponent) as actions. My problem is that the network does not learn at all or even gets worse over time.
The first thing I did was getting in touch with torch and a deep q learning module for it: https://github.com/blakeMilner/DeepQLearning .
Then I wrote a simple tic-tac-toe game where a random player competes with the neural net and plugged this into the code from this sample https://github.com/blakeMilner/DeepQLearning/blob/master/test.lua . The output of the network consists of 9 nodes for setting the respective cell.
A move is valid if the network chooses an empty cell (no X or O in it). According to this I give positive reward (if network chooses empty cell) and negative rewards (if network chooses an occupied cell).
The problem is it never seems to learn. I tried lots of variations:
- mapping the tic-tac-toe field as 9 inputs (0 = cell empty, 1 = player 1, 2 = player 2) or as 27 inputs (e.g. for an empty cell 0 [empty = 1, player1 = 0, player2 = 0])
- vary the hidden nodes count between 10 and 60
- tried up to 60k iterations
- varying learning rate between 0.001 and 0.1
- giving negative rewards for fails or only rewards for success, different reward values
Nothing works :(
Now I have a couple of questions:
- Since this is my very first attempt at Q-Learning is there anything I am fundamentally doing wrong?
- What parameters are worth changing? The "Brain" thing has a lot: https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L57 .
- What would a good count for the number of hidden nodes be?
- Is the simple network structure as defined at https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L116 too simple for this problem?
- Am I just too impatient and have to train much more iterations?
Thank you,
-Matthias