Neural Network Backpropagation does not compute weights correctly

Question

Currently, I am having problems with the Backpropagation algorithm. I am trying to implement it and use it to recognize the direction of faces (left, right, down, straight). Basically, I have N images, read the pixels and change its values(0 to 255) to values from 0.0 to 1.0. All images are 32*30. I have an input layer of 960 neurons, a hidden layer of 3 neurons and an output layer of 4 neurons. For example, the output <0.1,0.9,0.1,0.1> means that the person looks to the right. I followed the pseudy-code. However, it doesn't work right - it does not compute the correct weights and consequently it can't handle the training and test examples. Here are parts of the code:

    // main function - it runs the algorithm
     private void runBackpropagationAlgorithm() {
        for (int i = 0; i < 900; ++i) {
            for (ImageUnit iu : images) {
                double [] error = calcOutputError(iu.getRatioMatrix(), iu.getClassification());
                changeHiddenUnitsOutWeights(error);
                error = calcHiddenError(error);
                changeHiddenUnitsInWeights(error,iu.getRatioMatrix());
            }
        }
    }

  // it creates the neural network
    private void createNeuroneNetwork() {
            Random generator = new Random();
            for (int i = 0; i < inHiddenUnitsWeights.length; ++i) {
                for (int j = 0; j < hiddenUnits; ++j) {
                    inHiddenUnitsWeights[i][j] = generator.nextDouble();
                }
            }
            for (int i = 0; i < hiddenUnits; ++i) {
                for (int j = 0; j < 4; ++j) {
                    outHddenUnitsWeights[i][j] = generator.nextDouble();
                }
            }
        }
   // Calculates the error in the network. It runs through the whole network.
private double [] calcOutputError(double[][] input, double [] expectedOutput) {
        int currentEdge = 0;
        Arrays.fill(hiddenUnitNodeValue, 0.0);
        for (int i = 0; i < input.length; ++i) {
            for (int j = 0; j < input[0].length; ++j) {
                for (int k = 0; k < hiddenUnits; ++k) {
                    hiddenUnitNodeValue[k] += input[i][j] * inHiddenUnitsWeights[currentEdge][k];
                }
                ++currentEdge;
            }
        }
        double[] out = new double[4];
        for (int j = 0; j < 4; ++j) {
            for (int i = 0; i < hiddenUnits; ++i) {
                out[j] += outHddenUnitsWeights[i][j] * hiddenUnitNodeValue[i];
            }
        }
        double [] error = new double [4];
        Arrays.fill(error, 4);
        for (int i = 0; i < 4; ++i) {
            error[i] = ((expectedOutput[i] - out[i])*(1.0-out[i])*out[i]);
            //System.out.println((expectedOutput[i] - out[i]) + " " + expectedOutput[i] + " " +  out[i]);
        }
        return error;
    }

// Changes the weights of the outgoing edges of the hidden neurons
private void changeHiddenUnitsOutWeights(double [] error) {
        for (int i = 0; i < hiddenUnits; ++i) {
            for (int j = 0; j < 4; ++j) {
                outHddenUnitsWeights[i][j] += learningRate*error[j]*hiddenUnitNodeValue[i];
            }
        }
    }

// goes back to the hidden units to calculate their error.
private double [] calcHiddenError(double [] outputError) {
        double [] error = new double[hiddenUnits];
        for (int i = 0; i < hiddenUnits; ++i) {
            double currentHiddenUnitErrorSum = 0.0;
            for (int j = 0; j < 4; ++j) {
                currentHiddenUnitErrorSum += outputError[j]*outHddenUnitsWeights[i][j];
            }
            error[i] = hiddenUnitNodeValue[i] * (1.0 - hiddenUnitNodeValue[i]) * currentHiddenUnitErrorSum;
        }
        return error;
    }

// changes the weights of the incomming edges to the hidden neurons. input is the matrix of ratios
private void changeHiddenUnitsInWeights(double [] error, double[][] input) {
        int currentEdge = 0;
        for (int i = 0; i < input.length; ++i) {
            for (int j = 0; j < input[0].length; ++j) {
                for (int k = 0; k < hiddenUnits; ++k) {
                    inHiddenUnitsWeights[currentEdge][k] += learningRate*error[k]*input[i][j];
                }
                ++currentEdge;
            }
        }
    }

As the algorithm works, it computes bigger and bigger weights, which finally approach infinity (NaN values). I checked the code. Alas, I didn't manage to solve my problem. I will be firmly grateful to anyone who would try to help me.

Did you rule the 'precision' issue out of it? I mean, are you sure this isn't just a floating point issue? Other than that I'd guess your backprop or hidden out-weights don't calculate properly. Unless you tested this NN on a smaller sample and proved it working. — Shark, Aug 16 '12 at 16:33
I presume it's not a floating point issue. I tried on one example and runned the algorithm 9000 times. The output was still an array NaN values. Just after the 5th iteration the values become infinite. I could not understand why this happens. — Мартин Радев, Aug 16 '12 at 16:58
Does it learn the XOR problem correctly? This is very neat to debug such a thing. — Thomas Jungblut, Aug 16 '12 at 17:51
I tried it, but still the weights rise to infinity... I going to try to debug it, tomorrow — Мартин Радев, Aug 16 '12 at 18:54

score 3 · Answer 1 · edited May 23 '17 at 12:04

I didn't check all of your code. I just want to give you some general advices. I don't know if your goal is (1) to learn the direction of faces or (2) to implement your own neural network.

In case (1) you should consider one of those libraries. They just work and give you much more flexible configuration options. For example, standard backpropagation is one of the worst optimization algorithms for neural networks. The convergence depends on the learning rate. I can't see which value you chose in your implementation, but it could be too high. There are other optimization algorithms that don't require a learning rate or adapt it during training. In addition, 3 neurons in the hidden layer is most likely not enough. Most of the neural networks that have been used for images have hundreds and sometimes even thousands of hidden units. I would suggest you first try to solve your problem with a fully developed library. If it does work, try implementing your own ANN or be happy. :)

In case (2) you should first try to solve a simpler problem. Take a very simple artificial data set, then take a standard benchmark and then try it with your data. A good way to verify that your backpropagation implementation works is a comparison with a numerical differentation method.

mtrsky · Accepted Answer · 2012-08-24T00:35:49.673

Your code is missing the transfer functions. It sounds like you want the logistic function with a softmax output. You need to include the following in calcOutputError

// Logistic transfer function for hidden layer. 
for (int k = 0; k < hiddenUnits; ++k) {
    hiddenUnitNodeValue[k] = logistic(hiddenUnitNodeValue[k]);
}

and

// Softmax transfer function for output layer.
sum = 0;
for (int j = 0; j < 4; ++j) {
    out[j] = logistic(out[j]);
    sum += out[j];
}
for (int j = 0; j < 4; ++j) {
    out[j] = out[j] / sum;
}

where the logistic function is

public double logistic(double x){
    return (1/(1+(Math.exp(-x)));
}

Note that the softmax transfer function gives you outputs that sum to 1, so they can be interpreted as probabilities.

Also, your calculation of the error gradient for the output layer is incorrect. It should simply be

for (int i = 0; i < 4; ++i) {
    error[i] = (expectedOutput[i] - out[i]);
}

score 0 · Answer 3 · answered Aug 17 '12 at 14:52

I haven't tested your code but I am almost certain that you start out with to large weights. Most of the introductions on the subjects leave it at "init the weights with random values" and leaving out that the algorithm actually diverges (goes to Inf) for some starting values.

Try using smaller starting values, for example between -1/5 and 1/5 and shrink it down.

And additionally do an method for matrix multiplication, you have (only) used that 4 times, much easier to see if there is some problem there.

score 0 · Answer 4 · answered Aug 17 '12 at 20:31

I had a similar problem with a neural network processing grayscale images. You have 960 input values ranging between 0 and 255. Even with small initial weights, you can end up having inputs to your neurons with a very large magnitude and the backpropagation algorithm gets stuck.

Try dividing each pixel value by 255 before passing it into the neural network. That's what worked for me. Just starting with extremely small initial weights wasn't enough, I believe due to the floating-point precision issue brought up in the comments.

As suggested in another answer, a good way to test your algorithm is to see if your network can learn a simple function like XOR.

And for what it's worth, 3 neurons in the hidden layer was plenty for my purpose (identifying the gender of a facial image)

score 0 · Answer 5 · answered Sep 05 '12 at 17:11

0

I wrote an entire new neural-network library and it works. It is sure that in my previous attempt I missed the idea of using transfer functions and their derivatives. Thank you, all!

answered Sep 05 '12 at 17:11

Мартин Радев

134
10

Neural Network Backpropagation does not compute weights correctly

5 Answers5