Considering a neural network with two hidden layers. In this case we have three matrices of weights. Lets say I'm starting the training. In the first round I'll set random values for all weights of the three matrices. If this is correct I have two questions about:
1- Should I do the training from the input layer to the right or otherwise?
2- In the second round of the trainging I have to apply the gradient descent on the weights. Should I apply on all weights of all matrices an after that calculate the error or apply it weight by weight checking if the error has decreased to go to the next weight and so on to finally go to the next training round?