11

I am trying to implement a grid search over parameters in sklearn using randomized search and a grouped k fold cross-validation generator. The following works:

skf=StratifiedKFold(n_splits=5,shuffle=True,random_state=0)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=skf,n_iter=10)
rs.fit(X,y)

This doesn't

gkf=GroupKFold(n_splits=5)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)

#ValueError: The groups parameter should not be None

How do I indicate the groups parameter?

Neither does this

gkf=GroupKFold(n_splits=5)
fv = gkf.split(X, y, groups=groups)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)

#TypeError: object of type 'generator' has no len()
Mattravel
  • 1,358
  • 1
  • 15
Sam Weisenthal
  • 2,791
  • 9
  • 28
  • 66

1 Answers1

23

For reference, this is done via

rs.fit(X,y,groups=groups)

for

rs=sklearn.model_selection.RandomizedSearchCV(forest,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
Sam Weisenthal
  • 2,791
  • 9
  • 28
  • 66