I am training a Support Vector Machine using sklearn. In the beginning, I did not encode the class labels in any form and left them as Strings (e.g. ["A", "B", "C"]). The resulting accuracy was comparable to when I used a LabelEncoder [0, 1, 2]. So is sklearn automatically converting the Strings into integers/one-hot-representations in the background? Or am I missing something here?
Asked
Active
Viewed 880 times
1 Answers
3
The labels have to be encoded only if the labels are part of the independent variables. So, if you have a list of labels that are used by the SVM to determine the dependent variable, you would have to encode them either using labelencoder or onehotencoder or however fits your dataset best. Scikit-learn encodes strings automatically, so you don't have to encode them manually by writing the code. This means that for text targets, Scikit-learn encodes them without you having to make the encoding. Hope this helped!

A_not1234
- 155
- 8