Labels (y) in Scikit-learn Pipeline and Compose Classes

Question

My current understanding is that, we cant directly transform/retrieve y-labels passed as (X,y) while using a Pipeline.

The fit_transform at the end returns transformations only on the X passed and y is only utilized in situations involving fit(), fit_predict() and such.

Is my understanding correct?

Also is there a way to transform and retrieve y (including when dropping instances using a Custom Transformer) without having to break out of a fully enclosed model training pipeline?

https://stackoverflow.com/a/70191787/10375049 – Marco Cerliani Dec 02 '21 at 09:12 — Marco Cerliani, Dec 02 '21 at 09:12

score 0 · Accepted Answer · answered Jul 08 '21 at 08:13

0

In general, your understanding is correct. Pipeline objects are meant for sequential application of several transformations of X. From the user guide:

Pipelines only transform the observed data (X).

Also have a look at the gloassary about the term transform:

transform
In a transformer, transforms the input, usually only X, into some transformed space (conventionally notated as Xt).

In case of a regression tasks, there is a special TransformedTargetRegressor which deals with transforming the target y and can e.g. be used at the end of a pipeline.

Other than that, there is no canonical way in controlling transformations of y in a pipeline.

answered Jul 08 '21 at 08:13

afsharov

4,774
2
10
27

Thanks for the reply. So just to be certain, if am to use a LabelEncoder or a CustomTransformer to remove certain instances(X and y), I would have to do it outside a pipeline performing other tasks. @afsharov . – AkD Jul 08 '21 at 11:28
@AkD that's right. If you need to drop samples, it should be done before fitting the pipeline. Same goes for encodings of `y`. – afsharov Jul 08 '21 at 11:57

Labels (y) in Scikit-learn Pipeline and Compose Classes

1 Answers1