2

This minimal code works fine for scikit-learn pipeline :

inner_pipe = Pipeline(steps=[('scaler', StandardScaler())])
my_pipeline = Pipeline(
    steps=[('pre_step', inner_pipe), 
           ('rfc', RandomForestClassifier())])

but if the used pipeline is from Imbalanced-Learn, it produce a TypeError:

All intermediate steps of the chain should be estimators that implement fit and transform or fit_resample. 'Pipeline(steps=[('scaler', StandardScaler())])' implements both)

What is the correcte way (if any) of using nested pipelines with in Imbalanced-Learn ?
If not then how to combine both scikit-learn pipeline and Imbalanced-Learn pipeline so that everything works fine?

Edit :
after some testing I found that this works too :

inner_pipe = Pipeline(steps=[('scaler', StandardScaler())])
my_pipeline = Pipeline(
    steps=[*inner_pipe.steps, 
           ('rfc', RandomForestClassifier())])

But I bet this is not a good way of doing this, as it consist of re-instantiating the pipeline's steps instead of passing the actual objects.

abdelgha4
  • 351
  • 1
  • 16
  • You should not have to nest `Pipeline` indeed. A `Pipeline` given by `imbalanced-learn` is exactly the same as a `scikit-learn` `Pipeline. It only add a method to handle `fit_resample` of the sampler. So `Pipeline(steps=[("scaler", StandardScaler()), ("clf", RandomForestClassifier())]) using `imblearn.pipeline.Pipeline` will be be enough. – glemaitre Nov 17 '21 at 14:02

0 Answers0