When is Momentum Applied in Tensorflow Gradient Tape?

Question

I've been playing around with automatic gradients in tensorflow and I had a question. If we are updating an optimizer, say ADAM, when is the momentum algorithm applied to the gradient? Is it applied when we call tape.gradient(loss,model.trainable_variables) or when we call model.optimizer.apply_gradients(zip(dtf_network,model.trainable_variables))?

Thanks!

score 0 · Answer 1 · answered Sep 17 '20 at 09:27

tape.gradient computes the gradients straightforwardly without reference to an optimizer. Since momentum is part of the optimizer, the tape does not include it. AFAIK momentum is usually implemented by adding extra variables in the optimizer that store the running average. All of this is handled in optimizer.apply_gradients.

When is Momentum Applied in Tensorflow Gradient Tape?

1 Answers1