In CNTK implementation of ADAM optimizer, how the parameters alpha, beta1, beta2 and epsilon relate to learning rate and momentum

Question

I am using the adam_sgd optimiser to train a neural network and I am having trouble associating the arguments in the function with the parameters reported in the paper for Adam. More specifically how do the parameters alpha, beta1, beta2 and epsilon relate to learning rate and momentum in the CNTK implementation of Adam?

score 0 · Answer 1 · edited Jan 23 '17 at 21:56

0

Alpha is the learning_rate
Beta1 is momentum parameter
Beta2 is variance_momentum parameter

edited Jan 23 '17 at 21:56

approxiblue

6,982
16
51
59

answered Dec 23 '16 at 18:04

Sayan Pathak

870
4
7

In CNTK implementation of ADAM optimizer, how the parameters alpha, beta1, beta2 and epsilon relate to learning rate and momentum

1 Answers1