How backpropagation through gradient descent represents the error after each forward pass

Question

In Neural NEtwork Multilayer Perceptron, I understand that the main difference between Stochastic Gradient Descent (SGD) vs Gradient Descent (GD) lies in the way of how many samples are chosen while training. That is, SGD iteratively chooses one sample to perform forward pass followed by backpropagation to adjust the weights, as oppose to GD where the backpropagation starts only after the entire samples have been calculated in the forward pass).

My questions are:

When the Gradient Descent (or even mini-batch Gradient Descent) is the chosen approach, how do we represent the error from a single forward pass? Assuming that my network has only a single output neuron, is the error represented by averaging all the individual errors from each sample or by summing all of them?
In MLPClassifier scikit learn, does anyone know how such error is accumulated? Averaging or summing?

Thank you very much.

score -1 · Accepted Answer · answered Dec 09 '17 at 15:28

I think I can answer your first question. Yes, the error of a single forward pass is computed as either the instantaneous error, e.g., the norm of the difference between the network output and the desired response (label), if one sample is fed to the network or the average of the instantaneous errors obtained from feeding a mini-batch of samples.

I hope this helps.

How backpropagation through gradient descent represents the error after each forward pass

1 Answers1