I'm implementing simple gradient descent in octave but its not working. Here is the data I'm using:
X = [1 2 3
1 4 5
1 6 7]
y = [10
11
12]
theta = [0
0
0]
alpha = 0.001 and itr = 50
This is my gradient descent implementation:
function theta = Gradient(X,y,theta,alpha,itr)
m= length(y)
for i = 1:itr,
th1 = theta(1) - alpha * (1/m) * sum((X * theta - y) .* X(:, 1));
th2 = theta(2) - alpha * (1/m) * sum((X * theta - y) .* X(:, 2));
th3 = theta(3) - alpha * (1/m) * sum((X * theta - y) .* X(:, 3));
theta(1) = th1;
theta(2) = th2;
theta(3) = th3;
end
Questions are:
- It produces some values of theta which I use in
theta * [1 2 3]
and expect an output near about 10 (from y). Is that the correct way to test the hypothesis? [h(x) = theta' * x] - How can I determine how many times should it iterate? If I give it 1500 iterations, theta gets extremely small (in e).
- If I use double digit numbers in X, theta gets too small again. Even with < 5 iterations.
I've been struggling with these things for a long time now. Unable to resolve it myself.
Sorry for bad formatting.