2

I am implementing MLP Classifier where I want to give string as input.

df = pd.DataFrame(results)
X = df.iloc[:, [2]].values
y = df.iloc[:, [1]].values

X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = MLPClassifier(random_state=6, max_iter=200).fit(X_train, 
y_train.ravel())
clf.predict()

I am getting this error

Error

desertnaut
  • 57,590
  • 26
  • 140
  • 166

2 Answers2

2

You need to represent you strings to numeric format in order to apply most of Machine learning algorithms.

For example if you have 10 classes, you convert them to integers from 0 to 9 ( You can use sklearn to transform data to such format with Label Encoder for instance)

But it really depends, which type of data you have, you might also want to see one-hot encoding representation that maps each occurrence of your categorical feature to N dimensional array, where N is cardinality of your feature.

A.B
  • 20,110
  • 3
  • 37
  • 71
1

Anyways, as you are using pandas dataframe, you can do it more easily. For getting class label vector y it is too straightforward. Say the column name is 'label':

y = df['label'].factorize()[0]

If you do not have column name, just use the column number (for your case df[1]).

Wondering why I have taken [0] in factorization? pandas.factorize will not only give you the codes which we need here, but also it will give you the unique values of that column which are coded (uniques).

Again if some input feature column from feature matrix X is categorical (and non numeric), therefore encode it numerically. There are two types of encoding for categorical variables:

  • Label encoding: If the values of that feature has an order or hierarchy then use this encoding. See here.
  • One-hot encoding: If the values of that feature doesn't have any order or hierarchy, therefore use this encoding technique. See here.
hafiz031
  • 2,236
  • 3
  • 26
  • 48