I am using the exact example from SciKit, which compares permutation_importance
with tree feature_importances
As you can see, a Pipeline is used:
rf = Pipeline([
('preprocess', preprocessing),
('classifier', RandomForestClassifier(random_state=42))
])
rf.fit(X_train, y_train)
permutation_importance
:
Now, when you fit a Pipeline
, it will Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
Later in the example, they used the permutation_importance
on the fitted model:
result = permutation_importance(rf, X_test, y_test, n_repeats=10,
random_state=42, n_jobs=2)
Problem: What I don't understand is that the features in the result
are still the original non-transformed features. Why is this the case? Is this working correctly? What is the purpose of the Pipeline
then?
tree feature_importance
:
In the same example, when they use the feature_importance
, the results are transformed:
tree_feature_importances = (
rf.named_steps['classifier'].feature_importances_)
I can obviously transform my features and then use permutation_importance
, but it seems that the steps presented in the examples are intentional, and there should be a reason why permutation_importance
does not transform the features.