The Adam optimizer has several terms that are used to add "momentum" to the gradient descent algorithm, making the step size for each variable adaptive:
Specifically, in the case of Adam here, I refer to the m-hat and v-hat terms.
There are times, however, when you may wish to manually update their states, such as resetting training for a subset of variables. Is there a way to do this in PyTorch's momentum-based optimizers (especially Adam)?