I am using the adam_sgd optimiser to train a neural network and I am having trouble associating the arguments in the function with the parameters reported in the paper for Adam. More specifically how do the parameters alpha, beta1, beta2 and epsilon relate to learning rate and momentum in the CNTK implementation of Adam?
Asked
Active
Viewed 602 times
1 Answers
0
- Alpha is the learning_rate
- Beta1 is momentum parameter
- Beta2 is variance_momentum parameter

approxiblue
- 6,982
- 16
- 51
- 59

Sayan Pathak
- 870
- 4
- 7