I am using RandomForestClassifier()
with 10 fold cross validation
as follows.
clf=RandomForestClassifier(random_state = 42, class_weight="balanced")
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
accuracy = cross_val_score(clf, X, y, cv=k_fold, scoring = 'accuracy')
print(accuracy.mean())
I want to identify the important features in my feature space. It seems to be straightforward to get the feature importance for single classification as follows.
print("Features sorted by their score:")
feature_importances = pd.DataFrame(clf.feature_importances_,
index = X_train.columns,
columns=['importance']).sort_values('importance', ascending=False)
print(feature_importances)
However, I could not find how to perform feature importance
for cross validation
in sklearn.
In summary, I want to identify the most effective features (e.g., by using an average importance score
) in the 10-folds of cross validation.
I am happy to provide more details if needed.