Neural Network, gradient descent only finds the average of the outputs?

Question

This problem is more conceptual than in the code, so the fact that this is written in JS shouldn't matter very much.

So I'm trying to make a Neural Network and I'm testing it by trying to train it to do a simple task - an OR gate (or, really, just any logic gate). I'm using Gradient Descent without any batches for the sake of simplicity (batches seem unnecessary for this task, and the less unnecessary code I have the easier it is to debug).

However, after many iterations the output always converges to the average of the outputs. For example, given this training set:

[0,0] = 0
[0,1] = 1
[1,0] = 1
[1,1] = 0

The outputs, no matter the inputs, always converge around 0.5. If the training set is:

[0,0] = 0,
[0,1] = 1,
[1,0] = 1,
[1,1] = 1

The outputs always converge around 0.75 - the average of all the training outputs. This appears to be true for all combinations of outputs.

It seems like this is happening because whenever it's given something with an output of 0, it changes the weights to get closer to that, and whenever it's given something with an output of 1, it changes the weights to get closer to that, meaning that overtime it will converge around the average.

Here's the backpropagation code (written in Javascript):

this.backpropigate = function(data){
    //Sets the inputs
    for(var i = 0; i < this.layers[0].length; i ++){
        if(i < data[0].length){
            this.layers[0][i].output = data[0][i];
        }
        else{
            this.layers[0][i].output = 0;
        }
    }
    this.feedForward(); //Rerun through the NN with the new set outputs
    for(var i = this.layers.length-1; i >= 1; i --){
        for(var j = 0; j < this.layers[i].length; j ++){
            var ref = this.layers[i][j];
            //Calculate the gradients for each Neuron
            if(i == this.layers.length-1){ //Output layer neurons
                var error = ref.output - data[1][j]; //Error
                ref.gradient = error * ref.output * (1 - ref.output);
            }
            else{ //Hidden layer neurons
                var gradSum = 0; //Find sum from the next layer
                for(var m = 0; m < this.layers[i+1].length; m ++){
                    var ref2 = this.layers[i+1][m];
                    gradSum += (ref2.gradient * ref2.weights[j]);
                }
                ref.gradient = gradSum * ref.output * (1-ref.output);
            }
            //Update each of the weights based off of the gradient
            for(var m = 0; m < ref.weights.length; m ++){
                //Find the corresponding neuron in the previous layer
                var ref2 = this.layers[i-1][m];
                ref.weights[m] -= LEARNING_RATE*ref2.output*ref.gradient;
            }
        }
    }
    this.feedForward();
};

Here, the NN is in a structure where each Neuron is an object with inputs, weights, and an output which is calculated based on the inputs/weights, and the Neurons are stored in a 2D 'layers' array where the x dimension is the layer (so, the first layer is the inputs, second hidden, etc.) and the y dimension is a list of the Neuron objects inside of that layer. The 'data' inputted is given in the form [data,correct-output] so like [[0,1],[1]].

Also my LEARNING_RATE is 1 and my hidden layer has 2 Neurons.

I feel like there must be some conceptual issue with my backpropagation method, as I've tested out the other parts of my code (like the feedForward part) and it works fine. I tried to use various sources, though I mostly relied on the wikipedia article on backpropagation and the equations that it gave me.

.

I know it may be confusing to read my code, though I tried to make it as simple to understand as possible, but any help would be greatly appreciated.

@MatiasValdenegro sure, I tried that and it doesn't change much. The thing is I print out a list of the output of [1,1] (which should be 1) every time it backpropigates and it trends downward constantly until it reaches 0.25 which is the average in the set. — YourHomicidalApe, Jul 30 '17 at 20:42
EDIT: Sorry for potentially wasting your time, but your original comment did actually help. This being my first time using a Neural Network, I didn't realize how many repetitions through the set were required to actually create a working system. At a 0.05 LEARNING_RATE, it took 250,000 repetitions through the full training set to get it to work. Is this normal, or is my program inefficient? I assume it's a mix of both because I haven't implemented any optimizations/efficiencies yet. — YourHomicidalApe, Jul 30 '17 at 20:51
Sounds normal, the learning rate and number of iterations are related. A smaller learning rate requires more iterations, but if you set the learning rate too large then learning can diverge. — Dr. Snoopy, Jul 30 '17 at 22:59
i suggest to have a look [here](https://datascience.stackexchange.com/questions/11589/creating-neural-net-for-xor-function), because XOR is a known problem for NN. — Nina Scholz, Jul 31 '17 at 06:47
The XOR problem has quite a strong local minima. It will either converge to `[1,1,1,0]` or `[0,1,1,1]` if it gets stuck in there. First of all, you're going to need more than 2 hidden neurons to solve this problem. I believe a mimial of **3** hidden neurons is needed. Then change your learning rate to `0.3` (or `0.1`). That should work. — Thomas Wagenaar, Aug 01 '17 at 14:36

Neural Network, gradient descent only finds the average of the outputs?

0 Answers0