When using the chain rule to calculate the slope of the cost function relative to the weights at the layer L
, the formula becomes:
d C0 / d W(L) = ... . d a(L) / d z(L) . ...
With :
z (L)
being the induced local field :z (L) = w1(L) * a1(L-1) + w2(L) * a2(L-1) * ...
a (L)
beeing the ouput :a (L) = & (z (L))
&
being the sigmoid function used as an activation function
Note that L
is taken as a layer indicator and not as an index
Now:
d a(L) / d z(L) = &' ( z(L) )
With &'
being the derivative of the sigmoid function
The problem:
But in this post which is written by James Loy on building a simple neural network from scratch with python,
When doing the backpropagation, he didn't give z (L)
as an input to &'
to replace d a(L) / d z(L)
in the chain rule function. Instead he gave it the output = last activation of the layer (L)
as the input the the sigmoid derivative &'
def feedforward(self): self.layer1 = sigmoid(np.dot(self.input, self.weights1)) self.output = sigmoid(np.dot(self.layer1, self.weights2)) def backprop(self): # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1 d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
Note that in code above the layer L
is the layer 2
which is the last or output layer.
And sigmoid_derivative(self.output)
this is where the activation of the current layer is given as input to the derivative of the sigmoid function used as an activation function.
The question:
Shouldn't we use this sigmoid_derivative(np.dot(self.layer1, self.weights2))
instead of this sigmoid_derivative(self.output)
?