-1

I found an example online which contains a method that back propagates the error and adjusts the weights. I was wondering how this exactly works and what weight update algorithm is used. Could it be gradient descent?

 /**
   * all output propagate back
   * 
   * @param expectedOutput
   *            first calculate the partial derivative of the error with
   *            respect to each of the weight leading into the output neurons
   *            bias is also updated here
   */
  public void applyBackpropagation(double expectedOutput[]) {

    // error check, normalize value ]0;1[
    for (int i = 0; i < expectedOutput.length; i++) {
        double d = expectedOutput[i];
        if (d < 0 || d > 1) {
            if (d < 0)
                expectedOutput[i] = 0 + epsilon;
            else
                expectedOutput[i] = 1 - epsilon;
        }
    }

    int i = 0;
    for (Neuron n : outputLayer) {
        ArrayList<Connection> connections = n.getAllInConnections();
        for (Connection con : connections) {
            double ak = n.getOutput();
            double ai = con.leftNeuron.getOutput();
            double desiredOutput = expectedOutput[i];

            double partialDerivative = -ak * (1 - ak) * ai
                    * (desiredOutput - ak);
            double deltaWeight = -learningRate * partialDerivative;
            double newWeight = con.getWeight() + deltaWeight;
            con.setDeltaWeight(deltaWeight);
            con.setWeight(newWeight + momentum * con.getPrevDeltaWeight());
        }
        i++;
    }

    // update weights for the hidden layer
    for (Neuron n : hiddenLayer) {
        ArrayList<Connection> connections = n.getAllInConnections();
        for (Connection con : connections) {
            double aj = n.getOutput();
            double ai = con.leftNeuron.getOutput();
            double sumKoutputs = 0;
            int j = 0;
            for (Neuron out_neu : outputLayer) {
                double wjk = out_neu.getConnection(n.id).getWeight();
                double desiredOutput = (double) expectedOutput[j];
                double ak = out_neu.getOutput();
                j++;
                sumKoutputs = sumKoutputs
                        + (-(desiredOutput - ak) * ak * (1 - ak) * wjk);
            }

            double partialDerivative = aj * (1 - aj) * ai * sumKoutputs;
            double deltaWeight = -learningRate * partialDerivative;
            double newWeight = con.getWeight() + deltaWeight;
            con.setDeltaWeight(deltaWeight);
            con.setWeight(newWeight + momentum * con.getPrevDeltaWeight());
        }
    }
}
Boris Strandjev
  • 46,145
  • 15
  • 108
  • 135
unleashed
  • 915
  • 2
  • 16
  • 35
  • 1
    As far as I can tell this represents a MLP with one hidden layer. And without being able to follow this code, I guess the backpropagation will result in an gradient descent. What exactly is your question? If you can use this code or what an gradient descent is? – Tim Feb 24 '12 at 13:44

2 Answers2

2

It seems to me this solution uses stochastic gradient descent. The main difference between it and the regular gradient decent is that the gradient is approximated for each example instead of calculating it for all examples and then selecting the best direction. This is the usual approach to implementing backpropagtion and even has some advantages to gradient decent(can avoid some local minima). I believe the article also exaplains what is the idea and there are also a lot of other articles that explain the main idea behind back-propagation.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
  • If this is SGD, then where is the loop over the training samples? There's a loop over the output units that increments `i` as if the number of output units is equal to the number of training samples, which seems totally absurd. – Fred Foo Feb 24 '12 at 13:52
  • 1
    I think there should be a loop that is calling this function and actually the function adapts the weights to only a single sample. This can be derrived from the fact that the only input of the function is the expected values of all the output perceptrons(only one value per neuron). – Ivaylo Strandjev Feb 24 '12 at 14:02
  • Ah, right! I was expecting `expectedOutput` to have length `n_samples`. – Fred Foo Feb 24 '12 at 14:30
1

This ugly looking article seems to be describing exactly the same version of the algorithm: http://www.speech.sri.com/people/anand/771/html/node37.html. I have the same formulas in my university papers, but regretfully: a) they are not available online; b) they are in language you will not understand.

As for gradient descent, the algorithm resembles gradient descent, but is not guaranteed to reach optimal position. In each step change is done over the network edges changing their values so that the training example value's probability increases.

Boris Strandjev
  • 46,145
  • 15
  • 108
  • 135