25

I am trying out multi-class classification with xgboost and I've built it using this code,

clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000)

clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages.

clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000, 
                        num_classes=9)

clf.fit(byte_train, y_train)  
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This code is also working but it's taking a lot of time to complete compared when to my first code.

Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?

user_12
  • 1,778
  • 7
  • 31
  • 72
  • not relative to the differing objectives, but for the softprob, does adding the parallel/threading parameter *n_jobs=-1* speed up the fitter somewhat compared to the hidden default of n_jobs=1? – develarist Mar 13 '20 at 03:29

3 Answers3

47

In fact, even if the default obj parameter of XGBClassifier is binary:logistic, it will internally judge the number of class of label y. When the class number is greater than 2, it will modify the obj parameter to multi:softmax.

https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

class XGBClassifier(XGBModel, XGBClassifierBase):
    # pylint: disable=missing-docstring,invalid-name,too-many-instance-attributes
    def __init__(self, objective="binary:logistic", **kwargs):
        super().__init__(objective=objective, **kwargs)

    def fit(self, X, y, sample_weight=None, base_margin=None,
            eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True, xgb_model=None,
            sample_weight_eval_set=None, callbacks=None):
        # pylint: disable = attribute-defined-outside-init,arguments-differ

        evals_result = {}
        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)

        xgb_options = self.get_xgb_params()

        if callable(self.objective):
            obj = _objective_decorator(self.objective)
            # Use default value. Is it really not used ?
            xgb_options["objective"] = "binary:logistic"
        else:
            obj = None

        if self.n_classes_ > 2:
            # Switch to using a multiclass objective in the underlying
            # XGB instance
            xgb_options['objective'] = 'multi:softprob'
            xgb_options['num_class'] = self.n_classes_
Joey Gao
  • 850
  • 2
  • 7
  • 14
21

By default, XGBClassifier uses the objective='binary:logistic'. When you use this objective, it employs either of these strategies: one-vs-rest (also known as one-vs-all) and one-vs-one. It may not be the right choice for your problem at hand.

When you use objective='multi:softprob', the output is a vector of number of data points * number of classes. As a result, there is an increase in time complexity of your code.

Try setting objective=multi:softmax in your code. It is more apt for multi-class classification task.

Saurabh Jain
  • 1,600
  • 1
  • 20
  • 30
4

By default,XGBClassifier or many Classifier uses objective as binary but what it does internally is classifying (one vs rest) i.e. if you have 3 classes it will give result as (0 vs 1&2).If you're dealing with more than 2 classes you should always use softmax.Softmax turns logits into probabilities which will sum to 1.On basis of this,it makes the prediction which classes has the highest probabilities.As you can see the complexity increase as Saurabh mentioned in his answer so it will take more time.

Sagar Dubey
  • 121
  • 4
  • 1
    But if we can also achieve the results using onevsrest as well then why would be choose softmax objective? Does it always improve accuracy than onevsrest approach? – user_12 Sep 21 '19 at 01:04
  • 2
    One Vs rest will train for two classifier while softmax will train for n number for class.let suppose you’ve 3 classes x1,x2,x3 .In one vs rest it will take x1 as one class and (x2,x3) as the other class it is a binary classifier but in softmax it will train for 3 different classes. You’ll get 3 different probabilities in this while 2 probability in one vs rest. – Sagar Dubey Sep 22 '19 at 19:48
  • 1
    Well both objectives are returning the probabilities for n class right? So we could use either approach.isn't it? – user_12 Sep 23 '19 at 04:15
  • 2
    If out of 3 classes you're intrested in only two let say positive and negative then you can use one vs rest otherwise softmax is preferred one.Let Suppose you've five classes Positive,Negative,Somewhat Positive,Somewhat Negative,Neutral.Here, you can go for One Vs rest as you can merge postive and neutral into one and can make prediction but if you want the probabilities of all the classes then softmax is a way to go.I hope you get it.:) – Sagar Dubey Sep 23 '19 at 06:30
  • 1
    But even with OneVsRest we can access probabilities with xgboost.predict_proba() isn't it? – user_12 Sep 23 '19 at 17:48