below diagram is the training loss values against epoch. Based on the diagram, does it mean I have make it over-fitting? If not, what is causing the spike in loss values along the epoch? In overall, it can be observed that the loss value is in decreasing trend. How should I tune my setting in deep Q-learning?
Asked
Active
Viewed 45 times
1 Answers
1
Such a messy loss trajectory would usually mean that the learning rate is too high for the given smoothness of the loss function.
An alternative interpretation is that the loss function is not at all predictive of the success at the given task.

Stefan Dragnev
- 14,143
- 6
- 48
- 52
-
Reinforcement learning is a bit different than normal supervised learning as it typically shows large variance like in the question. I would not say it is a problem of the OP, but rather of the whole field – BlackBear Mar 31 '20 at 16:13