8

I'm trying to build a simple regression line with pandas in spyder. After executing the following code, I got this error:

Found input variables with inconsistent numbers of samples: [1, 99]

the code:

import numpy as np
import pandas as pd

dataset = pd.read_csv('Phil.csv')

x = dataset.iloc[:, 0].values
y = dataset.iloc[:, 2].values

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x, y)

I think I know what is the problem, but I'm not quite sure how to deal with the syntax. In the variable explorer, the size of x (and y) is (99L,), and from what I remember it can't be a vector, and it must be size (99,1). same thing for y.

Saw a bunch of related topics, but none of them helped.

Dmitriy
  • 3,305
  • 7
  • 44
  • 55
sheldonzy
  • 5,505
  • 9
  • 48
  • 86
  • `y` can just be `(99,)` (need not be of shape `(99,1)`), but X must be a 2-d shape. Try `x = x.reshape(-1,1)` before fitting. – Vivek Kumar Aug 16 '17 at 05:44

2 Answers2

8

Referring to the sklearn documentation for LinearRegression (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit), the X vector needs to conform to the specification [n_samples,n_features].

Since you have only a single feature with many samples, the shape should be (99,1) - e.g., a single value per "row" with a single "column".

There are many ways to accomplish this (ref: Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work), in your case, the following should work:

regressor.fit(x[:, None], y)

Don't forget that predict requires the same shape to the data!

Peter Mularien
  • 2,578
  • 1
  • 25
  • 34
2

I got a similar issue as well.

ValueError: Found input variables with inconsistent numbers of samples: [20, 10]

I found a solution though. For my case the order of the splitting was not correct

I did

X_train, X_test, y_test, y_train = train_test_split(X,y,test_size=1/3, random_state=0) 

instead of :

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=1/3, random_state=0) 

Hope it helps future coders who run into similar errors.

Farouk Yahaya
  • 43
  • 1
  • 8