5

I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss, it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter?

I'm using Python v3.6 and Scikit-Learn v0.18.1

How can I use GridSearchCV with log_loss with multi-class model tuning?

My class representation:

1    31
2    18
3    28
4    19
5    17
6    22
Name: encoding, dtype: int64

My code:

param_test = {"criterion": ["friedman_mse", "mse", "mae"]}
gsearch_gbc = GridSearchCV(estimator = GradientBoostingClassifier(n_estimators=10), 
                        param_grid = param_test, scoring="log_loss", n_jobs=1, iid=False, cv=cv_indices)
gsearch_gbc.fit(df_attr, Se_targets)

Here's the tail end of the error and the full one is here https://pastebin.com/1CshpEBN:

ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.

UPDATE: Just use this to make the scorer based on based on @Grr

log_loss_build = lambda y: metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=sorted(np.unique(y)))
O.rka
  • 29,847
  • 68
  • 194
  • 309
  • Print your `Se_targets` here. And also have a look at http://scikit-learn.org/stable/modules/model_evaluation.html#multilabel-ranking-metrics – Vivek Kumar Apr 13 '17 at 02:17
  • @O.rka : labels=sorted(np.unique(y)) . Here y contains labels for entries in train set, right? Or does it contain labels for all the entries in the dataset? – Debbie Oct 11 '18 at 08:31

2 Answers2

4

my assumption is that somehow your data split has only one class label in y_true. while this seems unlikely based on the distribution you posted, i guess it is possible. While i havent run into this before it seems that in [sklearn.metrics.log_loss](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) the label argument is expected if the labels are all the same. The wording of this section of the documentation also makes it seem as if the method imputes a binary classification if labels is not passed.

Now as you correctly assume you should pass log_loss as scorer=sklearn.metrics.log_loss(labels=your_labels)

Grr
  • 15,553
  • 7
  • 65
  • 85
  • would my labels be the same as the "y_true"? – O.rka Apr 13 '17 at 16:35
  • I believe you would pass your list of classes, i.e. `[1,2,3,4,5,6]`. It looks like the `labels=` kwarg uses the `LabelBinarizer` to create the label classes and then binarize the y_true values. Check it out in the source [here](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/classification.py#L1557-L1677) – Grr Apr 13 '17 at 16:54
  • 3
    `TypeError: log_loss() missing 2 required positional arguments: 'y_true' and 'y_pred'` The method takes 2 default arguments that will change during cross-validation. – O.rka Apr 13 '17 at 17:27
  • you may have to use the [`make_scorer`](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) – Grr Apr 13 '17 at 18:02
2

You can simply specify "neg_log_loss_scorer" (or "log_loss_scorer") in older versions which will use the negative log loss.

Andreas Mueller
  • 27,470
  • 8
  • 62
  • 74