130

I'm getting this weird error:

classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)`

but then it also prints the f-score the first time I run:

metrics.f1_score(y_test, y_pred, average='weighted')

The second time I run, it provides the score without error. Why is that?

>>> y_pred = test.predict(X_test)
>>> y_test
array([ 1, 10, 35,  9,  7, 29, 26,  3,  8, 23, 39, 11, 20,  2,  5, 23, 28,
       30, 32, 18,  5, 34,  4, 25, 12, 24, 13, 21, 38, 19, 33, 33, 16, 20,
       18, 27, 39, 20, 37, 17, 31, 29, 36,  7,  6, 24, 37, 22, 30,  0, 22,
       11, 35, 30, 31, 14, 32, 21, 34, 38,  5, 11, 10,  6,  1, 14, 12, 36,
       25,  8, 30,  3, 12,  7,  4, 10, 15, 12, 34, 25, 26, 29, 14, 37, 23,
       12, 19, 19,  3,  2, 31, 30, 11,  2, 24, 19, 27, 22, 13,  6, 18, 20,
        6, 34, 33,  2, 37, 17, 30, 24,  2, 36,  9, 36, 19, 33, 35,  0,  4,
        1])
>>> y_pred
array([ 1, 10, 35,  7,  7, 29, 26,  3,  8, 23, 39, 11, 20,  4,  5, 23, 28,
       30, 32, 18,  5, 39,  4, 25,  0, 24, 13, 21, 38, 19, 33, 33, 16, 20,
       18, 27, 39, 20, 37, 17, 31, 29, 36,  7,  6, 24, 37, 22, 30,  0, 22,
       11, 35, 30, 31, 14, 32, 21, 34, 38,  5, 11, 10,  6,  1, 14, 30, 36,
       25,  8, 30,  3, 12,  7,  4, 10, 15, 12,  4, 22, 26, 29, 14, 37, 23,
       12, 19, 19,  3, 25, 31, 30, 11, 25, 24, 19, 27, 22, 13,  6, 18, 20,
        6, 39, 33,  9, 37, 17, 30, 24,  9, 36, 39, 36, 19, 33, 35,  0,  4,
        1])
>>> metrics.f1_score(y_test, y_pred, average='weighted')
C:\Users\Michael\Miniconda3\envs\snowflakes\lib\site-packages\sklearn\metrics\classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
0.87282051282051276
>>> metrics.f1_score(y_test, y_pred, average='weighted')
0.87282051282051276
>>> metrics.f1_score(y_test, y_pred, average='weighted')
0.87282051282051276

Also, why is there a trailing 'precision', 'predicted', average, warn_for) error message? There is no open parenthesis so why does it end with a closing parenthesis? I am running sklearn 0.18.1 using Python 3.6.0 in a conda environment on Windows 10.

I also looked at here and I don't know if it's the same bug. This SO post doesn't have solution either.

today
  • 32,602
  • 8
  • 95
  • 115
Sticky
  • 3,671
  • 5
  • 34
  • 58
  • 16
    There are some labels in y_true, which dont appear in y_pred and hence it is ill-defined – Vivek Kumar Apr 02 '17 at 02:25
  • 5
    @VivekKumar I am also getting the same warning. I have a balanced data set (500+500) and this warning seems to come up during the `clf = GridSearchCV(SVC(C=2), tuned_parameters, cv=cv, scoring='f1') clf.fit(X_train, y_train)` phase. It would be great to see what is causing the warning or how to rectify it. – salvu Sep 26 '17 at 11:59

8 Answers8

178

As mentioned in the comments, some labels in y_test don't appear in y_pred. Specifically in this case, label '2' is never predicted:

>>> set(y_test) - set(y_pred)
{2}

This means that there is no F-score to calculate for this label, and thus the F-score for this case is considered to be 0.0. Since you requested an average of the score, you must take into account that a score of 0 was included in the calculation, and this is why scikit-learn is showing you that warning.

This brings me to you not seeing the error a second time. As I mentioned, this is a warning, which is treated differently from an error in python. The default behavior in most environments is to show a specific warning only once. This behavior can be changed:

import warnings
warnings.filterwarnings('always')  # "error", "ignore", "always", "default", "module" or "once"

If you set this before importing the other modules, you will see the warning every time you run the code.

There is no way to avoid seeing this warning the first time, aside for setting warnings.filterwarnings('ignore'). What you can do, is decide that you are not interested in the scores of labels that were not predicted, and then explicitly specify the labels you are interested in (which are labels that were predicted at least once):

>>> metrics.f1_score(y_test, y_pred, average='weighted', labels=np.unique(y_pred))
0.91076923076923078

The warning will be gone.

aerin
  • 20,607
  • 28
  • 102
  • 140
Shovalt
  • 6,407
  • 2
  • 36
  • 51
  • 7
    This is a great answer but I would caution against choosing to compute the f1 score using `unique(y_pred)` as this may yield misleading scores. – Robert Sim Jul 09 '18 at 13:47
  • @RobertSim can you please elaborate a bit on this? Thank you. – Akash Kandpal Aug 24 '18 at 11:08
  • 9
    @harrypotter0, I think what he meant was that using `unique(y_pred)` as a standard practice, without understanding what you're doing, may have unexpected consequences. What happens here is that the labels that were not predicted at all are simply ignored. As long as you actively *decide* that that is what you are interested in, that's ok. When using this method, I would personally always print out the non-predicted labels (using the set operations at the beginning of the answer), to make sure I don't miss that fact. – Shovalt Aug 27 '18 at 07:08
  • I am referring to this question as I am having a similar issue. In my case, when I check the length of `y_test` and `y_pred` those have the same length. So how could it be ill-defined when every true label has a predicted label? – akalanka Feb 28 '19 at 18:33
  • 5
    @akalanka, try using `np.unique(y_pred)` and `np.unique(y_test)`. These probably have different lengths. All of your ys have a predicted label, but not all labels were predicted at least once. – Shovalt Feb 28 '19 at 19:49
  • @RobertSim Misleading scores as in, will set the metric to 0 for those labels not present in the predictions right? That would push down the metric as it is summing 0 and dividing by all labels right? Talking about multiclass "macroaverage" – 3nomis Jul 28 '21 at 10:14
  • @3nomis in the example I gave using the "labels" parameter, it might push the metric up. In that case the missing labels will be excluded from the average. The default behavior, in contrast (the one that raises the warning), does included zeros for missing labels and pushes down the metric. – Shovalt Jul 29 '21 at 11:54
  • @Shovalt Thanks fro the response. If you agree with me I'd be against from providing the labels and leave the default. It is better being a bit under-confident on the performances. Do you agree? – 3nomis Jul 29 '21 at 12:14
  • @3nomis I haven't actually ever worked with this metric (I have used f1 for binary classification though), so my intuition doesn't work here. I guess I'd be interested in knowing both with and without the missing labels, and perhaps even another metric that doesn't have this disadvantage. – Shovalt Jul 30 '21 at 16:07
  • set(y_test) - set(y_pred) was empty for me but still got the error – kkgarg Jan 02 '22 at 22:28
  • @kkgarg - might be there's a difference in data types, e.g. one is integers and the other floats. Assuming you don't have that many categories, you can print both of those sets and compare manually (or figure out why the set operation didn't work). In any case, `y_pred` should be a subset of `y_test` (well, technically a subset of `y_train`, but I do assume `y_test` doesn't have labels that weren't in the training set). – Shovalt Jan 03 '22 at 08:37
  • That's not the case with me. `y_pred` and `y_test` are of same types and have same number of labels. I think @petty.cf answer could be another possibility in my case, not sure though. – kkgarg Jan 03 '22 at 15:08
  • @Shovalt I understand this warning is because some classes in the test never get predicted. What I don't understand is a way to understand why a classification algorithm may behave such way. For example, I fitted an `SVC` and a `RF` classifiers on same data, `RF` doesn't predict a particular class in all runs of experiments. – arilwan Jul 06 '22 at 11:23
  • @arilwan I don't have a generic explanation. I can imagine several "excuses" but every specific case is different. – Shovalt Jul 08 '22 at 11:56
14

the same problem also happened to me when i training my classification model. the reason caused this problem is as what the warning message said "in labels with no predicated samples", it will caused the zero-division when compute f1-score. I found another solution when i read sklearn.metrics.f1_score doc, there is a note as follows:

When true positive + false positive == 0, precision is undefined; When true positive + false negative == 0, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and UndefinedMetricWarning will be raised. This behavior can be modified with zero_division

the zero_division default value is "warn", you could set it to 0 or 1 to avoid UndefinedMetricWarning. it works for me ;) oh wait, there is another problem when i using zero_division, my sklearn report that no such keyword argument by using scikit-learn 0.21.3. Just update your sklearn to the latest version by running pip install scikit-learn -U

petty.cf
  • 141
  • 1
  • 3
  • Not sure if they simply changed the warning message. But when I have the error, it already gives the reason as well: "UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. *Use `zero_division` parameter to control this behavior.*" – questionto42 Jul 11 '21 at 16:45
7

I ended up here with the same error but after reading @Shovalt's answer, I realized I was quite low in my test/train split. I had a large data set to start with but had split it down and one group was quite small. By making the sample size bigger, this warning went away and I got my f1 score. From this

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

to this

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
4

As I have noticed this error occurs under two circumstances,

  1. If you have used train_test_split() to split your data, you have to make sure that you reset the index of the data (specially when taken using a pandas series object): y_train, y_test indices should be resetted. The problem is when you try to use one of the scores from sklearn.metrics such as; precision_score, this will try to match the shuffled indices of the y_test that you got from train_test_split().

so, either use np.array(y_test) for y_true in scores or y_test.reset_index(drop=True)

  1. Then again you can still have this error if your predicted 'True Positives' is 0, which is used for precision, recall and f1_scores. You can visualize this using a confusion_matrix. If the classification is multilabel and you set param: average='weighted'/micro/macro you will get an answer as long as the diagonal line in the matrix is not 0

Hope this helps.

Ryan M
  • 18,333
  • 31
  • 67
  • 74
3

The accepted answer explains already well why the warning occurs. If you simply want to control the warnings, one could use precision_recall_fscore_support. It offers a (semi-official) argument warn_for that could be used to mute the warnings.

(_, _, f1, _) = metrics.precision_recall_fscore_support(y_test, y_pred,
                                                        average='weighted', 
                                                        warn_for=tuple())

As mentioned already in some comments, use this with care.

normanius
  • 8,629
  • 7
  • 53
  • 83
0

As the error message states, the method used to get the F score is from the "Classification" part of sklearn - thus the talking about "labels".

Do you have a regression problem? Sklearn provides a "F score" method for regression under the "feature selection" group: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html

In case you do have a classification problem, @Shovalt's answer seems correct to me.

akoeltringer
  • 1,671
  • 3
  • 19
  • 34
  • 1
    You are correct to point out the difference between regression and classification, but I concluded that this is a classification problem from the discrete nature of `y_test` and `y_pred` in the question. – Shovalt Mar 04 '18 at 08:25
0

I checked, as Shovalt suggested, the difference between the sets of truth values and predictions in a multilabel case but it did not help me to solve my problem.

So, I searched into sklearn.metrics.precision_recall_fscore_support source code (which is called by f1_score) to check how it works.

The code triggering the warning is the following :

precision = _prf_divide(
    tp_sum, pred_sum, "precision", "predicted", average, warn_for, zero_division
)
recall = _prf_divide(
    tp_sum, true_sum, "recall", "true", average, warn_for, zero_division
)
  • tpsum corresponds to TP (True Positives)
  • pred_sum corresponds to TP + FP (False Positives)
  • true_sum corresponds to TP + FN (False Negatives)
  • first parameter of _prf_divide is numerator of division
  • second parameter of _prf_divide is denominator of division

As soon as pred_sum or true_sum is equal to 0, it triggers the warning because division by 0 is not allowed.

In order to get these different values, use sklearn.metrics.multilabel_confusion_matrix. The result is a 3-dimensional array. You can see it as a list of 2x2 matrixes where each matrix represents the True Negatives (TN), False Positives (FP), False Negatives (FP) and the True Positives (TP) for each of your labels, structured as follow :

multilabel_confusion_matrix output

In my opinion, the problem should come from the model unability to predict some labels due to bad training or lack of samples.

Minilouze
  • 25
  • 10
0

This command works for me

sklearn.metrics.f1_score(y_true, y_pred,average='weighted',zero_division=0)
Ashiq Imran
  • 2,077
  • 19
  • 17