We are using gradient descent to calculate the gradient and then update the weights by backpropagation. There are plenty optimizers, like the ones you mention and many more.
The optimizers use an adaptive learning rate. With an adaptive loss we have more DoF to increase my learning rate on y directions and decrease along the x direction. They don't stuck on one direction and they are able to traverse more on one direction against the other.
RMSprop uses a momentum-like exponential decay to the gradient history. Gradients in extreme past have less influence. It modifies AdaGrad optimizer to perform better in the non-convex setting by changing the gradient accumulation into an exponentially weighted moving average.
Adam (adaptive moments) Calls the 1st and 2nd power of the gradient moments and uses a momentum-like decay on both moments. In addition, it uses bias correction to avoid initial instabilities of the moments.
How to chose one?
Depends on the problem we are trying to solve. The best algorithm is the one that can traverse the loss for that problem pretty well.
It's more empirical than mathematical