1

As part of an assignment, I have been trying to wipe up a pipeline to preprocess some data that I have. Said data has a total of five classes, one of which is imbalanced compared to the others, and therefore I decided to apply SMOTE for it. The code for the preprocessor pipeline can be seem below - it includes a few custom transformers but they are working properly so they are not relevant to the question.

from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE

text_preprocessor = ImbPipeline([
    ('tp', TextPreprocessor()),
    ('vec', CountVectorizer())
])

image_preprocessor = ImbPipeline([
    ("img", ImageBOFTransformer())
])

numerical_preprocessor = ImbPipeline([
    ("scl", StandardScaler())
])

categorical_preprocessor = ImbPipeline([
    ("rct", CardinalityReducer()),
    ("ohe", OneHotEncoder(sparse=False, handle_unknown='ignore'))
])

preprocessor = ImbPipeline([
    ("ord", OrdinalMapper()),
    ('ct', ColumnTransformer([
    ('categorical_preprocessor', categorical_preprocessor, categorical_cols),
    ('numerical_preprocessor', numerical_preprocessor, numerical_cols),
    ('text_preprocessor', text_preprocessor, "Description"),
    ('image_preprocessor', image_preprocessor, "Images")])),
    ('smote', SMOTE())])

I am able to successfully apply this pipeline to my training data with fit_resample. I then run a number of classifiers on my data, and do some hyperparameter tuning. However, when I decide to apply the same pipeline to the test data in order to transform it so I can generate predictions with the various classifiers, things go wrong. I use this code:

X_test_t = preprocessor.transform(X_test)

And get this error:

AttributeError: This 'Pipeline' has no attribute 'transform'.

I have no idea why this is happening. Would anyone be able to help me fix it?

Tried: X_test_t = preprocessor.transform(X_test)

Expect: X_test_t with the transformed values of X_test.

Resulted: AttributeError: This 'Pipeline' has no attribute 'transform'.

1 Answers1

0

The imblearn pipeline inherits its transform method from the sklearn one, which is set up to only work if the last step has a transform method, which SMOTE does not. This might be a good Issue on the imblearn github.

Since the resampling shouldn't happen on the test set anyway, maybe the easiest workaround is to slice out the last step and transform with the rest:

X_test_t = preprocessor[:-1].transform(X_test)
Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29