DQN unstable predictions

Question

i implemented DQN from scratch in java, everything is custom made. I made it to play snake and results are really good. But i have a problem.

To make network as stable as possible, im using replay memory and also target network. The network is converging really well. But after some time it just breaks.

This is a graph (X - played games, Y - average points scored )

This 'break' happens usually few games after i update target network with policy network.

Settings i use for DQN:

 discount factor: 0.9
 learning rate: 0.001
 steps to update target network: 300 000 (means every 300k steps i update target network with policy)
 replay memory size: 300 000
 replay memory batch size: 256 (every step i take 256 samples from replay memory and train network)

Any ideas what could be wrong? Thanks for answers.

every time you run the train function you get a report on the loss function. does it decrease/increase/remain stable? — Makis Kans, Jun 10 '20 at 20:33

score 0 · Answer 1 · answered Mar 11 '21 at 18:40

0

Look up "catastrophic forgetting"

Try adjusting your replay-memory size and the number of steps to update your target network.

answered Mar 11 '21 at 18:40

Thomas Dixon

1

DQN unstable predictions

1 Answers1