Alternatives of LabelEncoder() for target variable while implementing in a pipeline

Question

I am developing a classification base model. I have used the concept of ColumnTransformer and Pipeline for feature engineering and selection, model selection, and for everything. I wanted to encode my categorical target (dependent) variable to numeric inside the pipeline. Came to know that we cannot use LabelEncoder inside both CT and Pipeline because the fit only takes (y) and throws an error, 'TypeError: fit_transform() takes 2 positional arguments but 3 were given.' What are other alternatives for the target variable? Found a lot of stacks for similar but for features and recommendations were to use OHE and OrdinalEncoder!

score 0 · Answer 1 · answered Dec 22 '21 at 19:30

Basically, don't.

All (or at least most) sklearn classifiers will encode internally, and produce more useful information for you when they've been trained directly on the "real" target values. (E.g. predict will give the actual target values without you having to decode the mapping.)

(As for regression, if the target is actually ordinal in nature, you may be able to use TransformedTargetRegressor. Whether this makes sense probably depends on the model type.)

Alternatives of LabelEncoder() for target variable while implementing in a pipeline

1 Answers1