Linear regression using gradient descent; having trouble with cost function value

Question

I'm coding linear regression by using gradient descent. By using for loop not tensor.

I think my code is logically right, and when I plot the graph theta value and linear model seems to be coming out good. But the value of cost function is high. Can you help me?

The value of cost function is 1,160,934 which is abnormal.

def gradient_descent(alpha,x,y,ep=0.0001, max_repeat=10000000):
    m = x.shape[0]
    converged = False
    repeat = 0
    theta0 = 1.0
    theta3 = -1.0
#     J=sum([(theta0 +theta3*x[i]- y[i])**2 for i in range(m)]) / 2*m #######
    J=1
    
    while not converged : 
        grad0= sum([(theta0 +theta3*x[i]-y[i]) for i in range (m)]) / m
        grad1= sum([(theta0 + theta3*x[i]-y[i])*x[i] for i in range (m)])/ m
        
        temp0 = theta0 - alpha*grad0
        temp1 = theta3 - alpha*grad1
        
        theta0 = temp0
        theta3 = temp1
    
        msqe = (sum([(theta0 + theta3*x[i] - y[i]) **2 for i in range(m)]))* (1 / 2*m)
        print(theta0,theta3,msqe)
        if abs(J-msqe) <= ep:
            print ('Converged, iterations: {0}', repeat, '!!!')
            converged = True
        
        J = msqe
        repeat += 1
        
        if repeat == max_repeat:
                converged = True
                print("max 까지 갔다")
   
    return theta0, theta3, J
[theta0,theta3,J]=gradient_descent(0.001,X3,Y,ep=0.0000001,max_repeat=1000000)

print("************\n theta0 : {0}\ntheta3 : {1}\nJ : {2}\n"
          .format(theta0,theta3,J))

This is the data set.

score 2 · Accepted Answer · answered Oct 05 '20 at 17:15

2

I think the dataset itself is quite widespread and that's why the best fit line shows a large amount for the cost function. If you scale your data - you would see it drop significantly.

answered Oct 05 '20 at 17:15

Bharath

406
2
13

can you explain once again about scaling my data ? actually I scaled it by using log1p already.... which means, this value is not wrong or something? – danny lee Oct 05 '20 at 17:45
1

Scaling using log scales has an effect akin to folds. Ex what was twice becomes 2 - but still if the data is spead over a vast scale - I would advice going for a this transformation `(x-mean(x)) / std (x) ` where std is standard deviation of x. This should show your costs significantly low - however note to remember, this would not change your model parameters / coefficients by a large margin. The model performance as i said is not meaured by your cost. Cost is merely something you monitor to see if its decreasing with each iteration when you run your algorithms. Hope it helps. – Bharath Oct 06 '20 at 03:46
If it does'nt let me know - I will explain in greater detail. – Bharath Oct 06 '20 at 03:49
oh you mean scaling the x3 in x-mean(x)/std(x)! instead of making it in log – danny lee Oct 06 '20 at 04:43
oh and another thing, I think cost function is the way to check out if my linear model fits to the data set. But since the value is very big, how can I see if mine is good or not? I was plotting the linear model of mine(h=theta0+theta3*x3) and dataset at the same time. is that only way to check it out? – danny lee Oct 06 '20 at 05:31
1

Look at your model this way - Is the regression line you have drawn - the best possible fit. There is nothing more to the cost function than that. If your cost function reduces over each iteration that means your gradient descent is working well. How good the model is usually measured by stats like RMSE or MSE. (y_predicted - y_actual ) square. – Bharath Oct 06 '20 at 06:11
Thank you very much for helping me. I asked the last question because I was trying to add one more independent variable (x2) in it. and checked the graph with that (which shows 3 dimension surface) and kinda hard to tell whether I did good or not hahaha. Anyway but I checked that cost function value is dropping down so I think I did good. Thank you for help. It was very confusing to me about the 'value of cost function' – danny lee Oct 06 '20 at 06:33
No problem, any other thing you want to understand about linear regression, i would highly recommend first 3 videos of Andrew NGs course on coursera. or Excellent explanation on machine learning mastery. If you like you can mark this as accepted answer. That would help me with my points on stack overflow. Thank you – Bharath Oct 06 '20 at 07:37
yeah I would love to . how can I make your comment as main comment? sorryy not use to stack overflow as you can see – danny lee Oct 06 '20 at 08:16
Just below the number you would see a tick mark which marks an answer as accepted answer . Thanks again Danny. – Bharath Oct 06 '20 at 09:05
oh the check mark in green? I did it. Thank you for helping – danny lee Oct 06 '20 at 12:54
Yeah that did it. Thanks Danny. – Bharath Oct 06 '20 at 13:25

score 0 · Answer 2 · answered Oct 05 '20 at 20:29

0

It is quite normal for cost to be high while dealing with the large dataset which has huge variance. Moreover your data is dealing with big numbers so cost is pretty high, normalizing data will give you the correct estimate as normalized data don't need to be scaled. Try this for verifying start with random wrights, observe the cost every time if the cost fluctuates in huge range then there might be some mistake that else its fine.

answered Oct 05 '20 at 20:29

gilf0yle

1,092
3
9

thank you it makes me relief. Then, how can we know that the linear model that I have made (h=theta0+theat3*x3) is a good model for the dataset? I thought calculating cost function is one of them, but since the value is pretty high dunno whether it is good or not. Do we have to seaborn and make them visualize? – danny lee Oct 06 '20 at 04:51
To see how the solution converges to minima print the weights and cost at every iteration and train with train model with lesser epochs it will give you some visualization.to cross verify u can perform linear regression with some standard library by going through the documentation. By seeing the figure i fell that regression line of yours is very close to best fit. – gilf0yle Oct 06 '20 at 10:43
oh ok thx I will try to run it in tensor and compare the figures – danny lee Oct 06 '20 at 12:55
do tell if u find some thing new – gilf0yle Oct 06 '20 at 14:45

Linear regression using gradient descent; having trouble with cost function value

2 Answers2