I am using MXNet library in RStudio to train a neural network model.
When training the model using caret, I can tune (among others) the "momentum" parameter. Is this related with the Stochastic Gradient Descent optimizer?
I know that this is the default optimizer when training using "mx.model.FeedForward.create", but what happens when I am using caret:::train??