17

When I use xgboost to train my data for a 2-cates classification problem,I'd like to use the early stopping to get the best model, but I'm confused about which one to use in my predict as the early stop will return 3 different choices. For example, should I use

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

or should I use

preds = model.predict(xgtest, ntree_limit=bst.best_ntree_limit)

or both right, and they should be applied to different circumstances? If so, how can I judge which one to use?

Here is the original quotation of the xgboost document, but it didn't give the reason why and I also didn't find the comparison between those params:

Early Stopping

If you have a validation set, you can use early stopping to find the optimal number of boosting rounds. Early stopping requires at least one set in evals. If there's more than one, it will use the last.

train(..., evals=evals, early_stopping_rounds=10)

The model will train until the validation score stops improving. Validation error needs to decrease at least every early_stopping_rounds to continue training.

If early stopping occurs, the model will have three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. Note that train() will return a model from the last iteration, not the best one. Pr ediction

A model that has been trained or loaded can perform predictions on data sets.

# 7 entities, each contains 10 features 
data = np.random.rand(7, 10) 
dtest = xgb.DMatrix(data) 
ypred = bst.predict(dtest)

If early stopping is enabled during training, you can get predictions from the best iteration with bst.best_ntree_limit:

ypred = bst.predict(dtest,ntree_limit=bst.best_ntree_limit)

Thanks in advance.

LancelotHolmes
  • 659
  • 1
  • 10
  • 31
  • 1
    I was wondering this same thing. In a few tests that I've run, `bst.best_iteration` seems to be just 1 lower than `bst.best_ntree_limit`. My thinking is that `bst.best_iteration` gives you the number of the best one, but then you want to set the limit to 1 higher (so that it gets to the best one before it hits the limit). This seems a bit silly and simplistic though. Hence, I am still wondering... Have you noticed the same thing? – seth127 Jul 05 '17 at 18:01
  • 1
    @seth127, well, that is not true, actually, `bst.best_iteration seems to be just 1 lower than bst.best_ntree_limit` may be caused by the iteration you set, you can try to set the iterations higher. – LancelotHolmes Jul 05 '17 at 23:40
  • 1
    In my understanding, `best_ntree_limit` gives you the total number of trees, but `best_iteration` gives you the iteration *number*, which is one lower possibly because the iteration number starts from zero. – Nimit Pattanasri Aug 03 '17 at 13:23
  • 2
    Surprised that this question still doesn't have a good answer, or any clarification in the documentation yet! – information_interchange May 31 '19 at 20:45
  • 2
    The documentation lacks a clear explanation on this, but it seems : `best_iteration` is the best iteration, starting at 0. `best_ntree_limit`is the best number of trees. By default, it should be equal to `best_iteration`+1, since iteration 0 has 1 tree, iteration 1 has 2 trees and so on. BUT, you can define `num_parallel_tree`, which allow for multiples trees growing at each iterations. `best_score` should be the score ate the best n_tree, but it's unclear how it works with `num_parallel_tree`>1, since eval is not done after every trees is build. – CoMartel Apr 30 '20 at 08:24
  • 2
    My conclusion is : do not use `num_parallel_tree` (it defeats the whole boosting purpose anyway), and use `ntree_limit=bst.best_ntree_limit` – CoMartel Apr 30 '20 at 08:26

1 Answers1

4

In my point of view, both parameters refer to the same think, or at least have the same goal. But I would rather use:

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

From the source code, we can see here that best_ntree_limit is going to be dropped in favor of best_iteration.

def _get_booster_layer_trees(model: "Booster") -> Tuple[int, int]:
    """Get number of trees added to booster per-iteration.  This function will be removed
    once `best_ntree_limit` is dropped in favor of `best_iteration`.  Returns
    `num_parallel_tree` and `num_groups`.
    """

Additionally, best_ntree_limit has been removed from EarlyStopping documentation page.

So I think this attribute exist only for backwards compatibility reasons. From this code snippet and the documentation, we can therefore assume that best_ntree_limit is or will be deprecated.

Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29