I want to use sklearn.feature_selection.SelectFromModel
to extract features in a multi-step regression problem. The regression problem predicts multiple values using the MultiOutputRegressor
in combination with the RandomForestRegressor
. When I try to get the selected features with SelectFromModel.get_support()
it gives an error indicating that I need to make some feature_importances_
accessible for the method to work.
It is possible to access feature_importances_
of MultiOutputRegressor
as indicated in this question. However I am unsure on how to pass these feature_importances_
correctly to the SelectFromModel
class.
Here is what I did so far:
# make sample data
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=100, n_features=100, n_targets=5)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, shuffle=True)
# get important features for prediction problem:
from sklearn.multioutput import MultiOutputRegressor
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators = 100))
regr_multirf = regr_multirf.fit(X_train, y_train)
sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
sel.fit(X_train, y_train)
sel.get_support()
# to get feature_importances_ of Multioutputregressor use line:
regr_multirf.estimators_[1].feature_importances_
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-a1d635ad4a34> in <module>()
5 sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
6 sel.fit(X_train, y_train)
----> 7 sel.get_support()
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/feature_selection/_from_model.py in _get_feature_importances(estimator, norm_order)
30 "`feature_importances_` attribute. Either pass a fitted estimator"
31 " to SelectFromModel or call fit before calling transform."
---> 32 % estimator.__class__.__name__)
33
34 return importances
ValueError: The underlying estimator MultiOutputRegressor has no `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to SelectFromModel or call fit before calling transform.
Any help and hints would be appreciated.