0

Here is my code. I think it is wrong because the difference between this computed gradient and my numerical estimate is too significant. It doesn't seem to be due to wrongly inverting matrices, etc.

For context, Y is the output layer, X is the input layer, and there is only 1 hidden layer. Theta1 is the weights for the first input layer and Theta2 is the weights for the hidden layer.

for t = 1:m

% do fw prop again...
a1 = [1 X(i,:)];
a2 = [1 sigmoid(a1 * Theta1')];
a3 = sigmoid(a2 * Theta2');

delta_3 = a3' - Y(:, t);

delta_2 = Theta2' * delta_3 .* a2' .* (1 - a2)';

delta_2 = delta_2(2:end,:);

Theta1_grad = Theta1_grad + delta_2 * [1 X(i, :)];
Theta2_grad = Theta2_grad + delta_3 * [1 sigmoid([1 X(i,:)] * Theta1')];

end

grad = [Theta1_grad(:) ; Theta2_grad(:)];
Angus Fong
  • 11
  • 4
  • On what kind of data does the error occur? Which values did you observe / expect? – m8mble Jan 07 '17 at 11:44
  • Relative differences between gradient checking and backprop results significant to 3dp. Shouldn't be - in the exercise I was instructed that they would be significant to at most 9dp. – Angus Fong Jan 07 '17 at 17:56

0 Answers0