I am developing a classification base model. I have used the concept of ColumnTransformer and Pipeline for feature engineering and selection, model selection, and for everything. I wanted to encode my categorical target (dependent) variable to numeric inside the pipeline. Came to know that we cannot use LabelEncoder inside both CT and Pipeline because the fit only takes (y) and throws an error, 'TypeError: fit_transform() takes 2 positional arguments but 3 were given.' What are other alternatives for the target variable? Found a lot of stacks for similar but for features and recommendations were to use OHE and OrdinalEncoder!
Asked
Active
Viewed 447 times
1 Answers
0
Basically, don't.
All (or at least most) sklearn classifiers will encode internally, and produce more useful information for you when they've been trained directly on the "real" target values. (E.g. predict
will give the actual target values without you having to decode the mapping.)
(As for regression, if the target is actually ordinal in nature, you may be able to use TransformedTargetRegressor
. Whether this makes sense probably depends on the model type.)

Ben Reiniger
- 10,517
- 3
- 16
- 29