1

How to use LabelEncoder in sklearn pipeline?

NOTE The following code works for "OneHotEncoder" but fails for "LabelEncoder", How to use LabelEncoder in this circumstance?

MWE

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import make_column_transformer
import sklearn

print(sklearn.__version__) # 0.22.2.post1

df = sns.load_dataset('titanic').head()

le = OneHotEncoder() # this success
# le = LabelEncoder() # this fails

ct = make_column_transformer(
    (le, ['sex','adult_male','alone']),
    remainder='drop')

ct.fit_transform(df)

$$\begin{align}\mathsf P(N\mid E)&=\dfrac{\mathsf P(N\cap E)}{\mathsf P(E)}\[2ex]&=\dfrac{\mathsf P(N\cap E\mid F),\mathsf P(F)+\mathsf P(N\cap E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}{\mathsf P(E\mid F),\mathsf P(F)+\mathsf P(E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}\end{align}$$

BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • what error do you receive? and which line of code gives you the error? – PV8 Sep 15 '20 at 13:16
  • 1
    Ideally, LabelEncoder() is for the response variable, so it takes a series and you're trying to apply it to a data frame, hence the error. https://stackoverflow.com/a/63822728/5114585 explains when to use OHE, OE, LE or LB – Dr Nisha Arora Sep 16 '20 at 03:12

2 Answers2

2

From the docs, OneHotEncoder can take a dataframe and convert the categorical columns into the vectors you see. LabelEncoder takes a Series(your y / dependent variable) and generates new labels.

OnHotEncoder's usage: fit_transform(X,[y])

LabelEncoder's usage: fit_transform(y)

That's why it'll tell you: "fit_transform() takes 2 positional arguments but 3 were given"

Just call LabelEncoder fit_transform on the y directly if you really want to use it. Here is a similar question: How to use sklearn Column Transformer?

Here are the docs:

  1. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
  2. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
Subbu VidyaSekar
  • 2,503
  • 3
  • 21
  • 39
1

LabelEncoder was specially designed for encoding the target variable - y. That's why you can't use it to transform multiple columns at the same time as with OneHotEncoder.

Sklearn provides OrdinalEncoder for such circumstances. It can encode multiple columns at once when encoding features.

Bex T.
  • 1,062
  • 1
  • 12
  • 28