4

The Adam optimizer has several terms that are used to add "momentum" to the gradient descent algorithm, making the step size for each variable adaptive:

Specifically, in the case of Adam here, I refer to the m-hat and v-hat terms.

There are times, however, when you may wish to manually update their states, such as resetting training for a subset of variables. Is there a way to do this in PyTorch's momentum-based optimizers (especially Adam)?

user650261
  • 2,115
  • 5
  • 24
  • 47

1 Answers1

0

Since mhat and vhat are dependent on the "betas" parameter, you can update that during training if you wish, as so:

optimizer.param_groups[0]['betas'] = (beta1,beta2)

However, to address your question, the source code at https://github.com/pytorch/pytorch/blob/22b12179db15923007aaec80829766079bb0b9d1/torch/optim/_functional.py#L53 doesn't seem to support direct modification of the mhat (exp_avg) and vhat (exp_avg_sq, I at least presume). If you wish to do this I think you would have to implement your own Adam optimizer function (which seems relatively straightforward if you just copy the source code and modify it according to what you'd like to use).

jhso
  • 3,103
  • 1
  • 5
  • 13