2

I have a challenge using the sklearn 70-30 division. I receive an error on line:

X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)

The error is:

Found input variables with inconsistent numbers of samples

Context

from imblearn.over_sampling import SMOTE
    
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
    
X_smote, y_smote= sm.fit_sample(X,y)
    
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
    
from sklearn.model_selection import  train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')

Edit

I solve the error, I just had to put the stratify=y in stratify=y_smote

Paip
  • 21
  • 1
  • 3
  • https://stackoverflow.com/questions/30813044/sklearn-found-arrays-with-inconsistent-numbers-of-samples-when-calling-linearre I think this is the same issue – Abhinav Mathur Oct 20 '20 at 04:53
  • Hi, I tried with that solution and still doesn't work, thank you – Paip Oct 20 '20 at 11:02

2 Answers2

1

Just an observation in your line of code:

X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)

The error thrown typically is a result of some input value that is expected to have a particular dimension or length that is consistent with other input values.

Check the length and/or dimensions of X_smote, y_smote and y to see if they are all as expected.

  • Hi, X_smote has a length of (13993, 308), and y_smote has a length of (13993,) – Paip Oct 19 '20 at 23:33
  • Ok. Can you tell me the result of running the script when you change the line of code to: X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, random_state=42) The only thing that isn't clear to me is the input parameter "stratify=y". I'd like to see if your code returns the error without it. – Darryl Strachan Oct 20 '20 at 17:14
  • Hi, I already solved the error, the problem was in "stratify=y", it must be "stratify=y_smote". Thank you friend – Paip Oct 21 '20 at 21:12
0

I got the same Issue but when I changed

x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.25,random_state=42)

to

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=42)

my error got removed.