1

When using GridSearchCV on a custom estimator that is a wrapper on SVC, I get the error: "ValueError: The number of classes has to be greater than one; got 1 class"

The custom estimator is made to add gridsearch parameters to the estimator and seemed to work fine.

Using the debugger, I found that indeed, a one-class-only train set is given to my estimator, so 2 possibilities arise:

  • Either the estimator should handle one-class-only set

  • Either the GridSearchCV should not give one-class-only set

As I get an error from the SVC.fit call and that it seems that SVC should not receive one-class-only sets, I think it is the 2nd option. However I've looked in the GridSearchCV implementation but I didn't find anywhere where it checks whether there is a one-class-only fold or why it would fail ...

I used the grid search inside a cross validation to do a nested cross validation:

gs = GridSearchCV(clf.gs_clf.get_gs_clf(), parameter_grid, cv=n_inner_splits, iid=False)
gs.fit(*clf.get_train_set(X, y, train_index))

Abel Adary
  • 31
  • 4

2 Answers2

2

As you have already mentioned in your question, the problem is with the cross-validation splits not including the 2nd class data. This is probably due to an imbalance class issue with your data that when performing a stratified n_inner_splits folds, this under-samples class gets missed.

To overcome this, you could try one of the following:

1- Decrease the n_inner_splits according to the percentage of the under sampled class data and the number of instances you have.

2- Instead of passing an integer to the cv parameter of the gridsearch, you could do the splitting your self and pass it an iterable yielding (train, test) data, where you will be sure that the 2 classes are always represented.

3- Generate/acquire more data for this under sampled class.

Check the cv parameter documentation for others way to solve this here.

Ahmed Ragab
  • 836
  • 5
  • 10
  • Thank you for answering, I think I will try option 2 as I have a small dataset and need to keep the same inner splits. But for curiosity, you think GridsearchCV does not check for one-class-only sets ? – Abel Adary Apr 17 '19 at 12:21
2

I found the real issue, the documentation of GridsearchCV specifies for the parameter cv:

# For integer/None inputs, if the estimator is a classifier and ``y`` is
# either binary or multiclass, `StratifiedKFold` is used. In all
# other cases, `KFold` is used.

And one-class-only subsets are not possible for StratifiedKFold.

So the solution was for my custom estimator to inherit from the sklearn.base.ClassifierMixin

Abel Adary
  • 31
  • 4