Passing GridSearchCV results to an Imbalanced-Learn's Pipeline object

Question

Funny issue here - I have GridSearchCV results, which after cherry-picking from grid_search_cv.results_ attribute are captured as follows:

Input: pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params']

Output: {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}

Now, as I understand, Imbalanced Learn package's Pipeline object is a wrapper around SciKit-Learn's Pipeline, and it should accept **fit_params parameter in it's .fit() method, as follows:

clf = BalancedRandomForestClassifier(random_state = random_state, 
                                 n_jobs = n_jobs)

pipeline = Pipeline([('nt', nt), ('rf', clf)])

pipeline.fit(X_train, y_train, **pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'])

However, when I execute the above expression, I get the following result:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-64-a26424dc8038> in <module>
      4 pipeline = Pipeline([('nt', nt), ('rf', clf)])
      5 
----> 6 pipeline.fit(X_train, y_train, **pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'])
      7 
      8 print_scores(pipeline, X_train, y_train, X_test, y_test)

/opt/conda/lib/python3.7/site-packages/imblearn/pipeline.py in fit(self, X, y, **fit_params)
    237         Xt, yt, fit_params = self._fit(X, y, **fit_params)
    238         if self._final_estimator is not None:
--> 239             self._final_estimator.fit(Xt, yt, **fit_params)
    240         return self
    241 

TypeError: fit() got an unexpected keyword argument 'max_features'

Any ideas what am I doing wrong?

score 2 · Answer 1 · answered Jun 28 '19 at 10:42

Let us assume you come up with a set of hyper parameters something as following

hyper_params=  {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}

As mentioned by @ Parthasarathy Subburaj, these are not fit_params. We can set these params for a classifier inside a pipeline using .set_params() option

from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.datasets import make_classification
from imblearn.pipeline import Pipeline

X, y = make_classification(n_samples=1000, n_classes=3,
                           n_informative=4, weights=[0.2, 0.3, 0.5],
                           random_state=0)

clf = BalancedRandomForestClassifier(random_state=0)

pipeline = Pipeline([ ('rf', clf)])

hyper_params=  {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}
pipeline.set_params(**hyper_params)

pipeline.fit(X,y)

#
Pipeline(memory=None,
         steps=[('rf',
                 BalancedRandomForestClassifier(bootstrap=True,
                                                class_weight=None,
                                                criterion='gini', max_depth=40,
                                                max_features=2,
                                                max_leaf_nodes=None,
                                                min_impurity_decrease=0.0,
                                                min_samples_leaf=2,
                                                min_samples_split=2,
                                                min_weight_fraction_leaf=0.0,
                                                n_estimators=310, n_jobs=1,
                                                oob_score=False, random_state=0,
                                                replacement=False,
                                                sampling_strategy='auto',
                                                verbose=0, warm_start=False))],
         verbose=False)

Parthasarathy Subburaj · Answer 2 · 2019-06-28T05:17:47.000

1

Why are you feeding in the dataframe containing the parameters to build your model to your .fit() method, it just takes two arguments your X and y. You need to pass in your parameters of the model toBalancedRandomForestClassifier constructor. Since your parameter names doesn't match with the ones which BalancedRandomForestClassifier takes you need to feed it manually like this

clf = BalancedRandomForestClassifier(max_depth = 40, max_features = 2, n_estimators = 310, random_state = random_state, n_jobs = n_jobs)

Hope this helps!

edited Jun 28 '19 at 05:17

answered Jun 28 '19 at 04:52

Parthasarathy Subburaj

4,106
2
10
24

Thank you for your answer, but I was already aware of that possibility. What I need is the ability to load GridSearchCV's output into a .fit() method of the pipeline due to other reasons. – Greem666 Jun 28 '19 at 08:35
1

In that case you can try the `.best_params_` which is an attribute of gridsearch object which returns a dictionary of the best set of parameters, with keys being the parameter names and values being the value of the parameter itself. – Parthasarathy Subburaj Jun 28 '19 at 09:28

Passing GridSearchCV results to an Imbalanced-Learn's Pipeline object

2 Answers2