I just implemented Q-Learning without neural networks but I am stuck at implementing them with neural networks.
I will give you a pseudo code showing how my Q-Learning is implemented:
train(int iterations)
buffer = empty buffer
for i = 0 while i < iterations:
move = null
if random(0,1) > threshold:
move = random_move()
else
move = network_calculate_move()
input_to_network = game.getInput()
output_of_network = network.calculate(input_to_network)
game.makeMove(move)
reward = game.getReward()
maximum_next_q_value = max(network.calculate(game.getInput()))
if reward is 1 or -1: //either lost or won
output_of_network[move] = reward
else:
output_of_network[move] = reward + discount_factor * max
buffer.add(input_to_network, output_of_network)
if buffer is full:
buffer.remove_oldest()
train_network()
train_network(buffer b):
batch = b.extract_random_batch(batch_size)
for each input,output in batch:
network.train(input, output, learning_rate) //one forward/backward pass
My problem right now is that this code works for a buffer size of less than 200. For any buffer over 200, my code does not work anymore so I've got a few questions:
- Is this implementation correct? (In theory)
- How big should the batch size be compared to the buffer size
- How would one usually train the network? For how long? Until a specific MSE of the whole batch is reached?