16

This is my target (y):

target = [7,1,2,2,3,5,4,
      1,3,1,4,4,6,6,
      7,5,7,8,8,8,5,
      3,3,6,2,7,7,1,
      10,3,7,10,4,10,
      2,2,2,7]

I do not know why while I'm executing:

...
# Split the data set in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)

# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                 'C': [1, 10, 100, 1000]},
                {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(SVC(C=1), tuned_parameters)#scoring non esiste
    # I get an error in the line below
    clf.fit(X_train, y_train, cv=5)
...

I get this error:

Traceback (most recent call last):
  File "C:\Python27\SVMpredictCROSSeGRID.py", line 232, in <module>
clf.fit(X_train, y_train, cv=5)  #The minimum number of labels for any class cannot be less than k=3.
File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 354, in fit
return self._fit(X, y)
File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 372, in _fit
cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1148, in check_cv
cv = StratifiedKFold(y, cv, indices=is_sparse)
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 358, in __init__
" be less than k=%d." % (min_labels, k))
ValueError: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than k=3.
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
postgres
  • 2,242
  • 5
  • 34
  • 50

2 Answers2

19

The algorithm requires that there be at least 3 instances for a label in your training set. Although your target array contains at least 3 instances of each label, but when you split the data between training and testing, not all the training labels have 3 instances.

You either need to merge some class labels or increase your training samples to solve the problem.

jitendra
  • 1,438
  • 2
  • 19
  • 40
  • 1
    You could also pass a "cv" parameter, for example "KFold". Which version do you have btw, I think the input validation for StratifiedKFold (the default cv) got less strict in newer version of sklearn. Be careful in interpreting the results, though. They are probably not that meaningful. – Andreas Mueller Feb 18 '13 at 11:03
  • 1
    @AndreasMueller, Haven't tried input validation in case of StratifiedKFold. I will definitely check. Thanks for the suggestion. – jitendra Feb 18 '13 at 18:15
0

If you can't split the test and training set with each class populated enough in each fold, then try updating the Scikit library.

pip install -U scikit-learn

You'll get the same message as a warning, so you can run the code.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Bilal Dadanlar
  • 820
  • 7
  • 14