0

I have been trying to use machine learning to predict some data but it shows me can not convert str into int error, I even tried label encoder but I am still not able to successfully run the program.

I have tried label encoding

import pandas as pd 
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder



gender_data = pd.read_csv('gender.csv')

le = LabelEncoder()


X = gender_data.drop(columns=['Gender'])
y = gender_data['Gender']
Xv = X.values
yv = y.values

le_encoder_X = le.fit(Xv)
le_encoded_X = le.transform(Xv)


le_encoder_y = le.fit(yv)
le_encoded_y = le.transform(yv)

X_train, X_test, y_train, y_test = train_test_split(le_encoded_X, le_encoded_y, test_size=0.2)



model = DecisionTreeClassifier()
model.fit(X_train, y_train)

ValueError Traceback (most recent call last) in () 17 yv = y.values 18 ---> 19 le_encoder_X = le.fit(Xv) 20 le_encoded_X = le.fit(Xv) 21

F:\Anaconda\lib\site-packages\sklearn\preprocessing\label.py in fit(self, y) 93 self : returns an instance of self. 94 """ ---> 95 y = column_or_1d(y, warn=True) 96 self.classes_ = np.unique(y) 97 return self

F:\Anaconda\lib\site-packages\sklearn\utils\validation.py in column_or_1d(y, warn) 612 return np.ravel(y) 613 --> 614 raise ValueError("bad input shape {0}".format(shape)) 615 616

ValueError: bad input shape (66, 4)

JustBuster
  • 59
  • 6
  • It looks like you missed a step or made a typo, should the `__encoded=` steps be `le.transform()`? – G. Anderson Aug 14 '19 at 20:09
  • it was transform earlier but still didn't work – JustBuster Aug 14 '19 at 20:13
  • 1
    According to [the docs](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder), `labelencoder.fit()` takes an "array-like of shape (n_samples,)", but you've passed in multiple columns. In other words, it wants `(66,)` but you've given it `(66,4)`. You need to either use a different encoder or operate on a single column at a time – G. Anderson Aug 14 '19 at 20:21

0 Answers0