For predicting house prices using linear regression, I am not able to train the model using model.fit()
as it gives me an error.
Here is my code:
#importing dependencies
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
#data loading
dataset = pd.read_csv('/content/dataset - train.csv')
#data visualization
plt.xlabel('Area')
plt.ylabel('Price')
plt.scatter(dataset['LotArea'], dataset['SalePrice'], color='red', marker='*')
#splitting data into features and target
X = dataset.drop(['SalePrice'], axis = 1)
Y = dataset['LotArea']
#data splitting into train and test data
X_train, X_test, Y_train, Y_train = train_test_split(X, Y, test_size=0.2, random_state=0)
#training the model
model = LinearRegression()
model.fit(X_train, Y_train)
The error I get:
ValueError Traceback (most recent call last)
<ipython-input-31-a42a894194a6> in <module>()
1 model = LinearRegression()
----> 2 model.fit(X_train, Y_train)
3 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
332 raise ValueError(
333 "Found input variables with inconsistent numbers of samples: %r"
--> 334 % [int(l) for l in lengths]
335 )
336
ValueError: Found input variables with inconsistent numbers of samples: [1168, 292]
Please help me resolve this problem.