Permutation importance using a Pipeline in SciKit-Learn

Question

I am using the exact example from SciKit, which compares permutation_importance with tree feature_importances

As you can see, a Pipeline is used:

rf = Pipeline([
    ('preprocess', preprocessing),
    ('classifier', RandomForestClassifier(random_state=42))
])
rf.fit(X_train, y_train)

permutation_importance:

Now, when you fit a Pipeline, it will Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Later in the example, they used the permutation_importance on the fitted model:

result = permutation_importance(rf, X_test, y_test, n_repeats=10,
                                random_state=42, n_jobs=2)

Problem: What I don't understand is that the features in the result are still the original non-transformed features. Why is this the case? Is this working correctly? What is the purpose of the Pipeline then?

tree feature_importance: In the same example, when they use the feature_importance, the results are transformed:

tree_feature_importances = (
    rf.named_steps['classifier'].feature_importances_)

I can obviously transform my features and then use permutation_importance, but it seems that the steps presented in the examples are intentional, and there should be a reason why permutation_importance does not transform the features.

I can see from the code that it iterates over the originals columns of X (`for col_idx in range(X.shape[1])`) and does the transformation inside the loop. Can't think of a particular case where this can go wrong, but that's what's happening — towi_parallelism, Jun 01 '20 at 14:34
This frustrates me as well. I can't find an easy way to do permutation importance since everything is assembled into a pipeline. If I break out the preprocessor and transform BEFORE permutation, I get all sorts of headaches about column ordering for the labels. This could all be solved if the pipeline would be properly applied inside permutation_importance — Josh, Jul 24 '20 at 18:48
@Josh, yes I decided to do the same. I transform the features and then pass the transformed vector to the pipeline. — towi_parallelism, Jul 25 '20 at 13:13
how can you filter the boxplot to just the most important features? — Josh Zwiebel, Feb 28 '22 at 21:42

score 3 · Accepted Answer · answered Nov 17 '20 at 06:18

This is the expected behavior. The way permutation importance works is to shuffle the input data and apply it to the pipeline (or the model if that is what you want). In fact, if you want to understand how the initial input data effects the model then you should apply it to the pipeline.

If you are interested in the feature importance of each of the additional feature that is generated by your preprocessing steps, then you should generate the preprocessed dataset with column names and then apply that data to the model (using permutation importance) directly instead of the pipeline.

In most cases people are not interested in learning the impact of the secondary features that the pipeline generates. That is why they use the pipeline here to encompass the preprocessing and modeling steps.

Thanks for your answer. I guess there is another reason too. one-hot encoding results in perfect multicollinearity and feeding the PI with multicollinear features can be misleading — towi_parallelism, Nov 19 '20 at 10:39

Permutation importance using a Pipeline in SciKit-Learn

1 Answers1

Linked