-1

I'm trying to get randomized hyperparameter search to work with the voting classifier from sklearn by adapting the example given in the sklearn documentation.

I've seen this minimal working example, but it breaks in many ways using my version of sklearn.

Here is a stripped-down example:

import numpy as np
from sklearn import __version__ as skv
from sklearn.ensemble import RandomForestClassifier as RFClassi
from sklearn.ensemble import HistGradientBoostingClassifier as HGBClassi
from sklearn.tree import DecisionTreeClassifier as DTClassi
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import VotingClassifier
from sklearn.datasets import load_iris

print(f"sklearn version: {skv}")

df_X, target = load_iris(return_X_y=True, as_frame=True)
ensemble = ['rf','dtree','hgb']
hy_pa_grid = {
    'hgb': dict(learning_rate = list(np.linspace(0.01,0.5,10).round(3))),
    'rf':dict(criterion = ['gini', 'entropy']),
    'dtree':dict(criterion = ['gini', 'entropy']),
}
clfs = {'hgb' : HGBClassi(), 'rf': RFClassi(), 'dtree' : DTClassi()}
vc = VotingClassifier(estimators = clfs.items(), voting = 'soft')
params = {
    f"{c}__{p}" : hy_pa_grid[c][p]
    for c in ensemble
    for p in hy_pa_grid[c].keys()
}
print("\n".join(map(str,params.items())))
clf = RandomizedSearchCV(estimator = vc, param_distributions = params)
clf.fit(df_X,target)

The output I get is this:

sklearn version: 1.1.3
{'rf__criterion': ['gini', 'entropy'], 'dtree__criterion': ['gini', 'entropy'], 'hgb__learning_rate': [0.01, 0.064, 0.119, 0.173, 0.228, 0.282, 0.337, 0.391, 0.446, 0.5]}
Traceback (most recent call last):
  File "vc.py", line 34, in <module>
    clf.fit(df_X,target)                
  File "/home/USER/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 789, in fit
    base_estimator = clone(self.estimator)
  File "/home/USER/.local/lib/python3.8/site-packages/sklearn/base.py", line 87, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/home/USER/.local/lib/python3.8/site-packages/sklearn/base.py", line 68, in clone
    return copy.deepcopy(estimator)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'dict_items' object

Any ideas for getting round this? I also tried doing it with GridSearchCV, as in the example, but I get the same error.

Mr Felix U
  • 295
  • 1
  • 11

1 Answers1

0

Oops, it turns out the problem was in

estimators = clfs.items()

All was well once I wrapped it in tuple() to be an actual tuple rather than a generator.

Mr Felix U
  • 295
  • 1
  • 11