0

below diagram is the training loss values against epoch. Based on the diagram, does it mean I have make it over-fitting? If not, what is causing the spike in loss values along the epoch? In overall, it can be observed that the loss value is in decreasing trend. How should I tune my setting in deep Q-learning?

-

Yeo Keat
  • 143
  • 1
  • 9

1 Answers1

1

Such a messy loss trajectory would usually mean that the learning rate is too high for the given smoothness of the loss function.

https://www.jeremyjordan.me/nn-learning-rate/

An alternative interpretation is that the loss function is not at all predictive of the success at the given task.

Stefan Dragnev
  • 14,143
  • 6
  • 48
  • 52
  • Reinforcement learning is a bit different than normal supervised learning as it typically shows large variance like in the question. I would not say it is a problem of the OP, but rather of the whole field – BlackBear Mar 31 '20 at 16:13