0

I am training an agent with DQN. The reward is increasing and the loss is decreasing. It is a good sign I have great results. However, I have a little doubt because the loss decreased and suddenly jump to a very high value

Here is the first 20 epochs

===============================
Reward: 0.0 Steps: 0.0 Update: 1 Time: 1.2 Episodes: 1
Loss: 19796.0547
===============================
Reward: 13243.5 Steps: 100.0 Update: 3 Time: 5.33 Episodes: 2
Loss: 19431.1680
===============================
Reward: 13507.0 Steps: 100.0 Update: 6 Time: 5.56 Episodes: 3
Loss: 19586.0059
===============================
Reward: 13469.5 Steps: 100.0 Update: 9 Time: 5.96 Episodes: 4
Loss: 19398.0176
===============================
Reward: 13923.5 Steps: 100.0 Update: 12 Time: 6.34 Episodes: 5
Loss: 19539.2090
===============================
Reward: 13382.0 Steps: 100.0 Update: 15 Time: 6.57 Episodes: 6
Loss: 19461.4648
===============================
Reward: 14326.0 Steps: 100.0 Update: 18 Time: 6.89 Episodes: 7
Loss: 19103.9668
===============================
Reward: 15041.0 Steps: 100.0 Update: 21 Time: 7.16 Episodes: 8
Loss: 19470.4160
===============================
Reward: 15718.5 Steps: 100.0 Update: 24 Time: 7.52 Episodes: 9
Loss: 19668.2324
===============================
Reward: 14925.5 Steps: 100.0 Update: 27 Time: 8.0 Episodes: 10
Loss: 19771.4648
===============================
Reward: 15555.0 Steps: 100.0 Update: 30 Time: 8.12 Episodes: 11
Loss: 19788.6621
===============================
Reward: 14711.0 Steps: 100.0 Update: 33 Time: 8.52 Episodes: 12
Loss: 19724.0176
===============================
Reward: 15329.5 Steps: 100.0 Update: 36 Time: 9.03 Episodes: 13
Loss: 19551.4707
===============================
Reward: 15748.0 Steps: 100.0 Update: 39 Time: 9.17 Episodes: 14
Loss: 19516.3770
===============================
Reward: 15666.5 Steps: 100.0 Update: 42 Time: 9.39 Episodes: 15
Loss: 19426.6973
===============================
Reward: 15593.5 Steps: 100.0 Update: 45 Time: 9.85 Episodes: 16
Loss: 19327.2832
===============================
Reward: 15614.0 Steps: 100.0 Update: 48 Time: 10.13 Episodes: 17
Loss: 19158.5488
===============================
Reward: 15874.5 Steps: 100.0 Update: 51 Time: 10.47 Episodes: 18
Loss: 19061.7402
===============================
Reward: 15575.5 Steps: 100.0 Update: 54 Time: 10.68 Episodes: 19
Loss: 18895.0918
===============================
Reward: 15949.5 Steps: 100.0 Update: 57 Time: 11.01 Episodes: 20
Loss: 18741.6094

After 37 epochs, the reward reached ~17000 and the loss decreased to 15694.

Here where can notice the big jump in the loss. It does it 3 times over 100 episodes

Reward: 16366.0 Steps: 100.0 Update: 117 Time: 17.44 Episodes: 40
Loss: 15099.0156
===============================
Reward: 15909.5 Steps: 100.0 Update: 120 Time: 17.9 Episodes: 41
Loss: 14892.0322
===============================
Reward: 16744.5 Steps: 100.0 Update: 123 Time: 17.87 Episodes: 42
Loss: 14705.1650
===============================
Reward: 16613.5 Steps: 100.0 Update: 126 Time: 18.39 Episodes: 43
Loss: 14518.6943
===============================
Reward: 16422.0 Steps: 100.0 Update: 129 Time: 18.8 Episodes: 44
Loss: 19189.0879
===============================
Reward: 16820.5 Steps: 100.0 Update: 132 Time: 19.27 Episodes: 45
Loss: 28676.2344
===============================
Reward: 16513.5 Steps: 100.0 Update: 135 Time: 19.66 Episodes: 46
Loss: 28341.6875
===============================
Reward: 16878.5 Steps: 100.0 Update: 138 Time: 20.08 Episodes: 47
Loss: 27986.1465

I expected that the loss keeps decreasing or stabilizing. How can I explain the jump in the loss? How can I avoid it?

fgauth
  • 143
  • 1
  • 8

1 Answers1

2

Maybe it's the exploding gradient problem. There the loss suddenly gets very large during training. You could try to use L2 normalization (https://keras.io/regularizers/) and gradient clipping. Further you could play around with the learning rate, maybe decrease it or use another optimizer (e.g just SGD instead of Adam or whatever you're using. If you're using recurrent cells maybe try LSTMs instead of GRU. Here you can find some more information and ideas about how to fix it.

Syrius
  • 941
  • 6
  • 22