Gradient checking in backpropagation

Question

I'm trying to implement gradient checking for a simple feedforward neural network with 2 unit input layer, 2 unit hidden layer and 1 unit output layer. What I do is the following:

Take each weight w of the network weights between all layers and perform forward propagation using w + EPSILON and then w - EPSILON.
Compute the numerical gradient using the results of the two feedforward propagations.

What I don't understand is how exactly to perform the backpropagation. Normally, I compare the output of the network to the target data (in case of classification) and then backpropagate the error derivative across the network. However, I think in this case some other value have to be backpropagated, since in the results of the numerical gradient computation are not dependent of the target data (but only of the input), while the error backpropagation depends on the target data. So, what is the value that should be used in the backpropagation part of gradient check?

phoxis · Accepted Answer · 2021-03-29T09:32:54.407

Backpropagation is performed after computing the gradients analytically and then using those formulas while training. A neural network is essentially a multivariate function, where the coefficients or the parameters of the functions needs to be found or trained.

The definition of a gradient with respect to a specific variable is the rate of change of the function value. Therefore, as you mentioned, and from the definition of the first derivative we can approximate the gradient of a function, including a neural network.

To check if your analytical gradient for your neural network is correct or not, it is good to check it using the numerical method.

For each weight layer w_l from all layers W = [w_0, w_1, ..., w_l, ..., w_k]
    For i in 0 to number of rows in w_l
        For j in 0 to number of columns in w_l
            w_l_minus = w_l; # Copy all the weights
            w_l_minus[i,j] = w_l_minus[i,j] - eps; # Change only this parameter

            w_l_plus = w_l; # Copy all the weights
            w_l_plus[i,j] = w_l_plus[i,j] + eps; # Change only this parameter

            cost_minus = cost of neural net by replacing w_l by w_l_minus
            cost_plus = cost of neural net by replacing w_l by w_l_plus

            w_l_grad[i,j] = (cost_plus - cost_minus)/(2*eps)

This process changes only one parameter at a time and computes the numerical gradient. In this case I have used the (f(x+h) - f(x-h))/2h, which seems to work better for me.

Note that, you mentiond: "since in the results of the numerical gradient computation are not dependent of the target data", this is not true. As when you find the cost_minus and cost_plus above, the cost is being computed on the basis of

The weights
The target classes

Therefore, the process of backpropagation should be independent of the gradient checking. Compute the numerical gradients before backpropagation update. Compute the gradients using backpropagation in one epoch (using something similar to above). Then compare each gradient component of the vectors/matrices and check if they are close enough.

The formula you provided seems to be wrong, it calculates the negative gradient. The correct formula should be `(f(x+h) - f(x-h))/2h` and `w_l_grad[i,j] = (cost_plus-cost_minus)/(2*eps)` [source](http://ufldl.stanford.edu/tutorial/supervised/DebuggingGradientChecking/) — cwallenwein, Mar 29 '21 at 09:30
Thanks for noting this. Not sure why I did that. But fixed it now. — phoxis, Mar 29 '21 at 09:35

Paul Manta · Answer 2 · 2014-10-04T14:27:08.100

-1

Whether you want to do some classification or have your network calculate a certain numerical function, you always have some target data. For example, let's say you wanted to train a network to calculate the function f(a, b) = a + b. In that case, this is the input and target data you want to train your network on:

 a       b     Target

 1       1        2
 3       4        7
21       0        21
 5       2        7 

        ...

Just as with "normal" classification problems, the more input-target pairs, the better.

edited Oct 04 '14 at 14:27

answered Oct 04 '14 at 12:08

Paul Manta

30,618
31
128
208

Thank you for your response. However, I'm asking specifically about the case of the gradient checking procedure. – bdfgegtertasdg Oct 04 '14 at 12:15
Backpropagation is done exactly the same. A classification problem is just another kind of numerical problem. You need to choose an appropriate error function for your specific case, compute its derivative, and backpropagate that derivative. – Paul Manta Oct 04 '14 at 12:18
Then let `T` be the vector of target values, `Y` be the current output vector, and `msq` be the error function. You will need to backpropagate the derivative `d/dy msq(T, Y)`, for each `y` in `Y`. – Paul Manta Oct 04 '14 at 14:16

Gradient checking in backpropagation

2 Answers2

Linked