0

i implemented DQN from scratch in java, everything is custom made. I made it to play snake and results are really good. But i have a problem.

To make network as stable as possible, im using replay memory and also target network. The network is converging really well. But after some time it just breaks.

This is a graph (X - played games, Y - average points scored )

enter image description here

This 'break' happens usually few games after i update target network with policy network.

Settings i use for DQN:

 discount factor: 0.9
 learning rate: 0.001
 steps to update target network: 300 000 (means every 300k steps i update target network with policy)
 replay memory size: 300 000
 replay memory batch size: 256 (every step i take 256 samples from replay memory and train network)

Any ideas what could be wrong? Thanks for answers.

MrHolal
  • 329
  • 3
  • 5

1 Answers1

0

Look up "catastrophic forgetting"

Try adjusting your replay-memory size and the number of steps to update your target network.