0

My current understanding is that, we cant directly transform/retrieve y-labels passed as (X,y) while using a Pipeline.

The fit_transform at the end returns transformations only on the X passed and y is only utilized in situations involving fit(), fit_predict() and such.

Is my understanding correct?

Also is there a way to transform and retrieve y (including when dropping instances using a Custom Transformer) without having to break out of a fully enclosed model training pipeline?

afsharov
  • 4,774
  • 2
  • 10
  • 27
AkD
  • 13
  • 3

1 Answers1

0

In general, your understanding is correct. Pipeline objects are meant for sequential application of several transformations of X. From the user guide:

Pipelines only transform the observed data (X).

Also have a look at the gloassary about the term transform:

transform
In a transformer, transforms the input, usually only X, into some transformed space (conventionally notated as Xt).

In case of a regression tasks, there is a special TransformedTargetRegressor which deals with transforming the target y and can e.g. be used at the end of a pipeline.

Other than that, there is no canonical way in controlling transformations of y in a pipeline.

afsharov
  • 4,774
  • 2
  • 10
  • 27
  • Thanks for the reply. So just to be certain, if am to use a LabelEncoder or a CustomTransformer to remove certain instances(X and y), I would have to do it outside a pipeline performing other tasks. @afsharov . – AkD Jul 08 '21 at 11:28
  • @AkD that's right. If you need to drop samples, it should be done before fitting the pipeline. Same goes for encodings of `y`. – afsharov Jul 08 '21 at 11:57