My question is based on the data from Coursera course - https://www.coursera.org/learn/machine-learning/, but after a search is appears to be a common problem.
The gradient descent works perfectly on normalize data (pic.1), but goes in wrong direction on original data(pic.2) with J(cost function) growing very fast toward infinity. The difference between the parameters values is about 10^3.
I thought that normalization is required for better execution speed, I really can't see a reason of this growth in the cost function, even after a lot of search. Decreasing 'alpha', e.g. making it 0.001 or 0.0001 doesn't help either.
Please post if you have any ideas!
P.S. (I had manually provided matrices to the functions, where X_buf - normalized version and X_basic - original; Y - vector of all examles Q - theta vector, alpha - leaning rate).
function [theta, J_history] = gradientDescentMulti(X, Y, theta, alpha, num_iters)
m = length(Y);
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m)*X'*(X*theta-Y);
J_history(iter) = computeCostMulti(X, Y, theta);
end
end
And the second function:
function J = computeCostMulti(X, Y, theta)
m = length(Y); % number of training examples
J = 0;
J = (1/(2*rows(X)))*(X*theta-Y)'*(X*theta-Y);
end