Where did I go wrong in numpy normalization of input data in linear regression?

Question

When following through Andrew Ng's Machine learning course assignment - Exercise:1 in python, I had to predict the prize of a house given the size of the house in sq-feet,number of bedroom using multi variable linear regression.

In one of the steps where we had to predict the cost of the house on a new example X = [1,1650,3] where 1 is the bias term,1650 is the size of the house and 3 is the number of bedrooms, I used the below code to normalize and predict the output:

X_vect = np.array([1,1650,3])
X_vect[1:3] = (X_vect[1:3] - mu)/sigma
pred_price = np.dot(X_vect,theta)
print("the predicted price for 1650 sq-ft,3 bedroom house is ${:.0f}".format(pred_price))

Here mu is the mean of the training set calculated previously as [2000.68085106 3.17021277],sigma is the standard deviation of the training data calculated previously as [7.86202619e+02 7.52842809e-01] and theta is [340412.65957447 109447.79558639 -6578.3539709 ]. The value of X_vect after the calculation was [1 0 0].Hence the prediction code :

pred_price = np.dot(X_vect,theta_vals[0])

gave the result as the predicted price for 1650 sq-ft,3 bedroom house is $340413. But this was wrong according to the answer key.So I did it manually as below:

print((np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma)

This is the value of normalized form of X_vect and the output was [[-0.44604386 -0.22609337]].

The next line of code to calculate the hypothesis was:

print(340412.65957447 + 109447.79558639*-0.44604386 + 6578.3539709*-0.22609337)

Or in cleaner code:

X1_X2 = (np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma

xo = 1

x1 = X1_X2[:,0:1]

x2 = X1_X2[:,1:2]

hThetaOfX = (340412.65957447*xo + 109447.79558639*x1 + 6578.3539709*x2)

print("The price of a 1650 sq-feet house with 3 bedrooms is ${:.02f}".format(hThetaOfX[0][0]))

This gave the result of the predicted price to be $290106.82.This was matching the answer key.

My question is where did I go wrong in my first approach?

You only need to change the `X_vect` data type to float, otherwise assignment will operate with integer division leading to `[1,0,0]` instead of `[ 1. , -0.44604386, -0.22609337]`. — FBruzzesi, Jul 31 '20 at 13:50
Thanks a lot! Didnt reliaze that was my mistake. After changing it to X_vect = np.array([1,1650,3],dtype = "f") , I have got the correct answer. — Savannah Madison, Jul 31 '20 at 13:57

Where did I go wrong in numpy normalization of input data in linear regression?

0 Answers0