Conceptually, consider the gradient a fixed, mathematical value from automatic differentiation.
What every optimizer other than pure SGD does is to take the gradient and apply some statistical analysis to create a better gradient. In the simplest case, momentum, the gradient is averaged with previous gradients. In RMSProp, the variance of the gradient across batches is measured - the noisier it is, the less RMSProp "trusts" the gradient and so the gradient is reduced (divided by the stdev of the gradient for that weight). Adam does both.
Then, all optimizers multiply the statistically adjusted gradient by a learning rate.
So although one colloquial description of Adam is that it automatically tunes a learning rate... a more informative description is that Adam statistically adjusts gradients to be more reliable, but you still need to decide on a learning rate and how it changes during training (e.g. a LR policy). ReduceLROnPlateau, cosine decay, warmup, etc are examples of an LR policy.
Whether you program TF or PyTorch, the psuedocode on PyTorch's optimizers are my go to to understand the optimizer algorithms. Looks like a wall of greek letters as first, but you'll grok it if you stare at it for a few minutes.
https://pytorch.org/docs/stable/optim.html