How to fix: 'ValueError: Found input variables with inconsistent numbers of samples'

Question

For predicting house prices using linear regression, I am not able to train the model using model.fit() as it gives me an error.

Here is my code:

#importing dependencies
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

#data loading
dataset = pd.read_csv('/content/dataset - train.csv')

#data visualization
plt.xlabel('Area')
plt.ylabel('Price')
plt.scatter(dataset['LotArea'], dataset['SalePrice'], color='red', marker='*')

#splitting data into features and target
X = dataset.drop(['SalePrice'], axis = 1)
Y = dataset['LotArea']

#data splitting into train and test data
X_train, X_test, Y_train, Y_train = train_test_split(X, Y, test_size=0.2, random_state=0)

#training the model
model = LinearRegression()
model.fit(X_train, Y_train)

The error I get:

ValueError                                Traceback (most recent call last)
<ipython-input-31-a42a894194a6> in <module>()
      1 model = LinearRegression()
----> 2 model.fit(X_train, Y_train)

3 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    332         raise ValueError(
    333             "Found input variables with inconsistent numbers of samples: %r"
--> 334             % [int(l) for l in lengths]
    335         )
    336 

ValueError: Found input variables with inconsistent numbers of samples: [1168, 292]

Please help me resolve this problem.

Your `Y_train` appears twice in the split, you've probably meant `Y_test`. — dx2-66, Jun 27 '22 at 10:20

How to fix: 'ValueError: Found input variables with inconsistent numbers of samples'

0 Answers0