6

Trying to use BorutaPy for feature selection. but getting a TypeError: '(slice(None, None, None), array([0, 1, 2, 3, 4]))' is an invalid key.

from sklearn.ensemble import RandomForestClassifier
from boruta import BorutaPy
rf = RandomForestClassifier(n_jobs=-1, max_depth=4)

# define Boruta feature selection method
feat_selector = BorutaPy(rf, n_estimators='auto', verbose=2, random_state=1)

X = train_dt[['age', 'menopause', 'tumor_size', 'inv_nodes', 'node_caps',
   'deg_malig', 'breast', 'breast_quad', 'irradiat']]
Y = train_dt.label

# find all relevant features - 5 features should be selected
feat_selector.fit(x, y)

# check selected features - first 5 features are selected
feat_selector.support_

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

I used the breast cancer dataset and did some small tweaking like adding header, feature scaling and missing value handling.

Michel Das
  • 61
  • 1
  • 2

1 Answers1

5

I encountered the same mistake. I could trace it back to the pandas IndexEngine but couldn't figure out exactly what is wrong. You can get the model to run by converting your dataframes to numpy arrays, e.g:

feat_selector.fit(x.values, y.values)

Additionally, you named your x and y as X and Y the line before, but I'm sure that would've shown up as a different error had you actually used the code like that.

ThomasW
  • 344
  • 3
  • 10