When following through Andrew Ng's Machine learning course assignment - Exercise:1 in python, I had to predict the prize of a house given the size of the house in sq-feet,number of bedroom using multi variable linear regression.
In one of the steps where we had to predict the cost of the house on a new example X = [1,1650,3] where 1 is the bias term,1650 is the size of the house and 3 is the number of bedrooms, I used the below code to normalize and predict the output:
X_vect = np.array([1,1650,3])
X_vect[1:3] = (X_vect[1:3] - mu)/sigma
pred_price = np.dot(X_vect,theta)
print("the predicted price for 1650 sq-ft,3 bedroom house is ${:.0f}".format(pred_price))
Here mu
is the mean of the training set calculated previously as [2000.68085106 3.17021277]
,sigma
is the standard deviation
of the training data calculated previously as [7.86202619e+02 7.52842809e-01]
and theta
is [340412.65957447 109447.79558639 -6578.3539709 ]
. The value of X_vect
after the calculation was [1 0 0]
.Hence the prediction code :
pred_price = np.dot(X_vect,theta_vals[0])
gave the result as the predicted price for 1650 sq-ft,3 bedroom house is $340413
.
But this was wrong according to the answer key.So I did it manually as below:
print((np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma)
This is the value of normalized form of X_vect
and the output was [[-0.44604386 -0.22609337]]
.
The next line of code to calculate the hypothesis was:
print(340412.65957447 + 109447.79558639*-0.44604386 + 6578.3539709*-0.22609337)
Or in cleaner code:
X1_X2 = (np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma
xo = 1
x1 = X1_X2[:,0:1]
x2 = X1_X2[:,1:2]
hThetaOfX = (340412.65957447*xo + 109447.79558639*x1 + 6578.3539709*x2)
print("The price of a 1650 sq-feet house with 3 bedrooms is ${:.02f}".format(hThetaOfX[0][0]))
This gave the result of the predicted price to be $290106.82
.This was matching the answer key.
My question is where did I go wrong in my first approach?