from sklearn.preprocessing import LabelBinarizer
vs
from sklearn.preprocessing import LabelEncoder
What is difference between LabelEncoder
and LabelBinarizer
and which one to use when?
Thanks in advance.
from sklearn.preprocessing import LabelBinarizer
vs
from sklearn.preprocessing import LabelEncoder
What is difference between LabelEncoder
and LabelBinarizer
and which one to use when?
Thanks in advance.
labelEncoder
does not create dummy variable for each category in your X
whereas LabelBinarizer
does that. Here is an example from documentation.
from sklearn.preprocessing import LabelBinarizer,LabelEncoder
data1 = [1, 2, 2, 6]
lb = LabelBinarizer()
le = LabelEncoder()
print('LabelBinarizer output \n',lb.fit_transform(data1))
#LabelBinarizer output
[[1 0 0]
[0 1 0]
[0 1 0]
[0 0 1]]
print('LabelEncoder output \n',le.fit_transform(data1))
#LabelEncoder output
[0 1 1 2]
Hence if you want to just encode the categories into 0, 1, 2, 3, etc. use labelEncoder. If you want to create dummy variable for each category, then go for labeBinarizer.