Random Forest and Python

Question

I have a Random Forest model for a dataset with 72 features: Objective is to find the feature importances and use it for feature selection.

rf = RandomForestRegressor(n_estimators=XXX)  
rf.fit(X, y)

I am not able to get the list of predictors with their feature values, it just provide 72 feature importance numbers, which is very diffficult to map with each feature name, Is there a way to get the names and importance together Like

A 0.55656
B 0.4333

etc

This example does exactly what you want: http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html — Vivek Kumar, Dec 07 '17 at 05:22

score 1 · Answer 1 · answered Dec 06 '17 at 13:14

Suppose your features are assigned to a list called feature_labels,

You may print the feature importance as follows,

for feature in zip(feature_labels, rf.feature_importances_):
    print(feature)

The scores above are the importance scores for each variable. There thing to remember here is all the importance scores add up to 100%.

Inorder to identify and select most important features,

# Create a selector object that will use the random forest classifier to identify
# features that have an importance of more than 0.15
sfm = SelectFromModel(rf, threshold=0.15)

# Train the selector
sfm.fit(X_train, y_train)

'''SelectFromModel(estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=10000, n_jobs=-1, oob_score=False, random_state=0,
            verbose=0, warm_start=False),
        prefit=False, threshold=0.15)'''

# Print the names of the most important features
for feature_list_index in sfm.get_support(indices=True):
    print(feature_labels[feature_list_index])

This will print out your most important feature names based on the threshold setting.

score 0 · Answer 2 · answered Dec 06 '17 at 12:59

0

The feature_importance_ method keeps the order of features the tree was trained upon. So you can use a zip function between the original feature list and the feature_importance_ return value to get per each features its importance.

answered Dec 06 '17 at 12:59

yoav_aaa

367
2
11

Random Forest and Python

2 Answers2