I'm newbie in Neural network So I little bit confuse about ADAM optimezer. For Example I use MLP with architecture like this:
I've used SDG before, so I want to ask if changing the weight with adam's optimization is the same as SDG updating the weight on each layer? In the example above, does that mean there will be 2 weight changes from output to hid layer 2, 8 weight changes from hid layer 2 to hid layer 1, and finally 4 weight changes from hid layer 1 to input? because of the example I see, they only update the weights from the output to the hidden layer 2 only.