0

Im researching about MultiLayer Perceptrons, a kind of Neural Networks. When I read about Back Propagation Algorithm I see some authors suggest to update weights inmediately after we computed all errors for specific layer, but another authors explain we need to update weights after we get all errors for all layers. What are correct approach?

1st Approach:

function void BackPropagate(){
    ComputeErrorsForOutputLayer();
    UpdateWeightsOutputLayer();
    ComputeErrorsForHiddenLayer();
    UpdateWeightsHiddenLayer();
}

2nd Approach:

function void BackPropagate(){
    ComputeErrorsForOutputLayer();
    ComputeErrorsForHiddenLayer();
    UpdateWeightsOutputLayer();
    UpdateWeightsHiddenLayer();
}

Thanks for everything.

lejlot
  • 64,777
  • 8
  • 131
  • 164
Mr Rivero
  • 1,248
  • 7
  • 17
  • 1
    The output gradient concerns the current state of weights, so it doesn't make sense to first modify the weights and then propagate further the gradient that origins from their previous values. Therefore only the 2nd approach seems reasonable to me. Can you provide the source of the information about the 1st approach? – BartoszKP Aug 25 '13 at 12:38
  • @BartoszKP in the book ISBN 978-987-1347-51-3 (spanish language) chapter five talks about Neural Networks, and use 1st approach. In this url: http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf the implementation of back propagation algorithm update output weights inmediately after the output layer error calculation and after that compute error for hidden layer, and update hidden layer weights. – Mr Rivero Aug 26 '13 at 21:13
  • Indeed, that's how it says. However I second both answers below - it seems that there's an error in this book. – BartoszKP Aug 27 '13 at 09:55

3 Answers3

5

I am pretty sure that you have misunderstood the concept here. Two possible strategies are:

  • update weights after all errors for one input vector are calculated
  • update weights after all errors for all the input vectors are calculated

which is completely different from what you have written. These two method are sample/batch strategies, both having their pros and cons, due to simplicity the first approach is much more common in implementations.

Regarding your "methods", second method is the only correct one, process of "propagating" the error is just a computational simplification of computing derivative of error function, and the (basic) process of learning is a steepest descent method. If you compute the derivative only for part of dimensions (output layer), perform a step in the direction, and then recalculate the error derivatives according to new values, you are not performing a gradient descent. The only scenario, where first method is acceptable is when your weights update do not interfer with your error computation, then it does not matter what order is used, as they are independent.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thanks for answer @lejlot, I dont misunderstood concepts, I saw in three opportunities some autors implement backpropagation algorithm with the 1st approach, currently Im using the 2nd approach, you can view my very basic MLP code on http://github.com/kellermanrivero/IA. – Mr Rivero Aug 26 '13 at 21:16
  • I have a question @lejlot I have 100 data point including stars and circles on a 2d surface. These are have x and y coordinates. By saying simply: (f(x,y) = star or circle). I am using [this article](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/). I am able to find my weights with many iterations but only for one point from 100 data set. How can I update my weights online ? – Yunus Temurlenk May 24 '21 at 21:59
3

@lejlot's answer is entirely correct

Your question is acctually refering to the two main approaches:


Batch backpropagation

Update weights after all errors for all the input vectors are calculated.

Online backpropagation

Update weights after all errors for one input vector are calculated.

There is a third method called Stochastic backpropagation, which is really just an online backpropagation with a random selection training pattern sequence.




Time Complexity

On average, the batch backpropagation method is the fastest one to converge - but the most difficult to implement. See a simple comparison here.




It is not possible to alter the weights of the output layer before computing the delta for layer below:

Here you can see the mathmatical equation for calculating the derivative 
of the Error with respect to the weights. (using Sidmoid)
O_i = the layer below   # ex: input
O_k = the current layer # ex: hidden layer
O_o = the layer above   # ex: output layer

enter image description here

As you can see, the dE/dW depends on the weights of the layer above. 
So you may not alter them before calculating the deltas for each layer.
Community
  • 1
  • 1
jorgenkg
  • 4,140
  • 1
  • 34
  • 48
  • Thanks @jorgenkg. But in the book ISBN 978-987-1347-51-3 (spanish language) chapter five talks about Neural Networks, and use 1st approach. In this url: http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf the implementation of back propagation algorithm update output weights inmediately after the output layer error calculation and after that compute error for hidden layer, and update hidden layer weights. – Mr Rivero Aug 26 '13 at 21:14
  • I've appended my answer. Here is a mathmatical proof that the `first method` in your question is a mathmatical error. [I will still recommend this youtube video. It taught me more than my professor.](https://www.youtube.com/watch?v=aVId8KMsdUU) – jorgenkg Aug 27 '13 at 08:56
  • Thanks @jorgenkg. I will review – Mr Rivero Aug 27 '13 at 17:41
  • I have a question @jorgenkg I have 100 data point including stars and circles on a 2d surface. These are have x and y coordinates. By saying simply: (f(x,y) = star or circle). I am using [this article](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/). I am able to find my weights with many iterations but only for one point from 100 data set. How can I update my weights online ? – Yunus Temurlenk May 24 '21 at 21:59
  • "Online weight updates" tend to imply that the weights will be adjusted by some learning rate for each pass of the instances in the training set. I gather that your dataset contain representations of stars and circles, and you aim to fit the NN as an regression of f(x,y of a shape) = [0, 1]. The weight updates are labeled as "online" if you apply the updates sequentially `error( f(shape1), target ) -> update weights -> error( f(shape2), target ) -> update weight`. – jorgenkg May 25 '21 at 05:20
-1

The question is different to choose between batch or online backpropagation.

Your question is legitimate one and I think that both approaches are good. The both approches are almost similar on many epochs but the 2nd looks just a little better even if everyone use the 1st.

PS : The 2nd approch works only on online backpropagation

Matthieu H
  • 525
  • 8
  • 19