I don't know how in detail Stochastic Gradient Descent algorithm works and I don't need to know this at the moment. What I know is that it minimizes loss function by calculating gradients and going into direction of the local minimum. But I'm using Stochastic Gradient Descent as a optimizer in my project using Keras and I don't know what the parameters of this optimizer mean. Obviously, those parameters are shortly described in documentation, but it's not specific enough and I still don't understand what they mean.
So could you explain those 4 parameters:
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter that accelerates SGD in the relevant direction and dampens oscillations.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
And how can I know how I should set them?