Gradient descent not working without normalization, why?

Question

My question is based on the data from Coursera course - https://www.coursera.org/learn/machine-learning/, but after a search is appears to be a common problem.

The gradient descent works perfectly on normalize data (pic.1), but goes in wrong direction on original data(pic.2) with J(cost function) growing very fast toward infinity. The difference between the parameters values is about 10^3.

I thought that normalization is required for better execution speed, I really can't see a reason of this growth in the cost function, even after a lot of search. Decreasing 'alpha', e.g. making it 0.001 or 0.0001 doesn't help either.

Please post if you have any ideas!

P.S. (I had manually provided matrices to the functions, where X_buf - normalized version and X_basic - original; Y - vector of all examles Q - theta vector, alpha - leaning rate).

function [theta, J_history] = gradientDescentMulti(X, Y, theta, alpha, num_iters)

m = length(Y); 
J_history = zeros(num_iters, 1);

for iter = 1:num_iters
theta = theta - (alpha/m)*X'*(X*theta-Y);
J_history(iter) = computeCostMulti(X, Y, theta);
end

end

And the second function:

function J = computeCostMulti(X, Y, theta)

m = length(Y); % number of training examples
J = 0;
J = (1/(2*rows(X)))*(X*theta-Y)'*(X*theta-Y);

end

Screenshots

Have you tried the course bulletin board for help? The TAs are pretty responsive. I don't see a problem with your code, but it's also been a year since I worked inside the ML functionality. — Prune, Jun 06 '16 at 18:44
try learning rate 1e-10 (depending on the scale of your unnormalized data) — lejlot, Jun 06 '16 at 22:15

Gradient descent not working without normalization, why?

0 Answers0