1

The Adam optimizer uses a momentum-like approach to train a neural network in fewer iterations of gradient descent than vanilla gradient descent. I'm trying to figure out whether the Adam optimizer works in a Q-learning situation where you have a non-stationary dataset and numerous model.fit() calls. Does Adam and other optimizers retain their momentum over the different calls or is the state of the optimizer reset at each fit call? I've tried searching the code on https://github.com/keras-team/keras/blob/master/keras/optimizers.py#L436 but can't find where this information is stored and whether it's retained.

Maxim
  • 52,561
  • 27
  • 155
  • 209
Arjan Groen
  • 604
  • 8
  • 16
  • 1
    Yes, the state of optimizer is preserved between `fit` calls. The optimizer is stored as an attribute of the model object. – today Aug 30 '19 at 14:22
  • So the momentum is part of the optimizer state? – Arjan Groen Aug 30 '19 at 14:30
  • Well, if I am not wrong, the momentum is stored in `self.weights` attribute (see [here](https://github.com/keras-team/keras/blob/61052bc1f1c141c5dba9f83a4af14322ec4e6d7c/keras/optimizers.py#L490)) and then it will be updated in the following for loop (see [here](https://github.com/keras-team/keras/blob/61052bc1f1c141c5dba9f83a4af14322ec4e6d7c/keras/optimizers.py#L502)). – today Aug 30 '19 at 14:53

0 Answers0