16

I have a question about XGBoost.

Do you know how to know the number of tree created in XGBoost? Unlike RandomForest, which model maker decides how many trees are made, XGBoost basically continues to create the trees till the loss function reaches certain figure. Therefore I want to know this.

Thank you.

lstodd
  • 168
  • 2
  • 9
kanam
  • 181
  • 2
  • 5

3 Answers3

20

It's a bit crooked, but what I'm currently doing is dump-ing the model (XGBoost produces a list where each element is a string representation of a single tree), and then counting how many elements are in the list:

# clf is a XGBoost model fitted using the sklearn API
dump_list = clf.get_booster().get_dump()
num_trees = len(dump_list)
OmerB
  • 4,134
  • 3
  • 20
  • 33
-1

In java, there appears not to be a direct way to do this. You can, however, use the result of a model dump to get the actual number of trees. Using a trained Booster:

int numberOfTrees = booster.getModelDump("", false, "text").length;
Nicio
  • 68
  • 1
  • 7
-2

This is controlled by you as the user. Is you use the native training API, then this is controlled by num_boost_round (default is 10) see the docs here:

num_boost_round (int) – Number of boosting iterations.

If you use the sklearn API, then this is controlled by n_estimators (default is 100) see the doc here:

n_estimators : int Number of boosted trees to fit.

The only caveat is that this is the maximum number of trees to fit the fitting can stop if you set up early stopping criterion. I'm not sure if you use that.

Mischa Lisovyi
  • 3,207
  • 18
  • 29
  • 1
    I wonder why is this answer downvoted? Did I get wrong the question or the answer? – Mischa Lisovyi Oct 15 '18 at 22:24
  • 1
    @Mykhalio OP wants to know the actual number of trees generated for a given fitted XGBoost object. Like you wrote this might be less than the user-defined parameter, so it can't be used. – OmerB Jan 08 '19 at 10:01
  • @OmerB that can be one interpretation of the original question. However, the question does *not* mention the early stopping procedure and is vague. If there is no early stopping implemented in the configuration, then my answer is correct. – Mischa Lisovyi Jan 08 '19 at 14:45
  • 3
    @Mykhalio - There are other effects as well. For example, in a multi-class problem, XGBoost creates separate trees for each class, so with 3 classes and 10 boosting rounds you might get 30 trees. In summary, that parameter can't be used neither as an upper- nor as a lower- bound. – OmerB Jan 09 '19 at 11:39