1
from sklearn.preprocessing import LabelBinarizer

vs

from sklearn.preprocessing import LabelEncoder

What is difference between LabelEncoder and LabelBinarizer and which one to use when?

Thanks in advance.

Rohan Nadagouda
  • 462
  • 7
  • 18
Nishant
  • 613
  • 3
  • 9
  • 21
  • [https://stackoverflow.com/questions/50473381/scikit-learns-labelbinarizer-vs-onehotencoder](https://stackoverflow.com/questions/50473381/scikit-learns-labelbinarizer-vs-onehotencoder) – Nafis Islam Dec 28 '18 at 10:08

1 Answers1

4

labelEncoder does not create dummy variable for each category in your X whereas LabelBinarizer does that. Here is an example from documentation.

from sklearn.preprocessing import LabelBinarizer,LabelEncoder
data1 = [1, 2, 2, 6]

lb = LabelBinarizer()
le = LabelEncoder()

print('LabelBinarizer output \n',lb.fit_transform(data1))
#LabelBinarizer output 
 [[1 0 0]
 [0 1 0]
 [0 1 0]
 [0 0 1]]

print('LabelEncoder output \n',le.fit_transform(data1))
#LabelEncoder output 
 [0 1 1 2]

Hence if you want to just encode the categories into 0, 1, 2, 3, etc. use labelEncoder. If you want to create dummy variable for each category, then go for labeBinarizer.

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77