1

During hyperparameter optimization of a boosted trees algorithm such as xgboost or lightgbm, is it possible to directly control the minimum (not just the maximum) number of boosting rounds (estimators/trees) when using early stopping? This need is motivated by the observation that models that stop training after too few rounds are consistently underfitted (have metrics significantly worse than state-of-the-art models that tend to have more boosting rounds).

The only solution I know is an indirect one: to adjust a linked hyperparameter - learning rate (reducing its upper limit in the search space). When set too high, learning rate can lead to underfitted models and thus for training to stop too quickly, i.e. with too few boosting rounds.

mirekphd
  • 4,799
  • 3
  • 38
  • 59

1 Answers1

1

Have you experimented with varying the parameter that sets the number of stopping rounds? Depending on the data, rounds and learning rate used I have seen this parameter set as low as 5 and high as 500.

If you provide some sample data and code it may be easier to suggest something.

cousin_pete
  • 578
  • 4
  • 15
  • Excellent idea! Will give it a try and come back with results. I'm already varying it between 10 and 50, so am probably setting the range too low for the desired number of rounds (judging from your experience of how high the upper limit can go) and by intuition (shorter averages would react to noise quicker and stop model training unnecessarily). – mirekphd Jun 28 '20 at 19:53
  • Hi mirekphd, One technique I have found useful at least to get ballpark values for hyperparameters is to use a random method as suggested by Jonathan Ratschat and also by Yang Lui. Grid methods are most common but can be slow. – cousin_pete Jun 29 '20 at 02:02
  • So I did a small A/B test on 2x10 models, all with `max_num_boost_round == 5000`. Setting `early_stopping_rounds == 5` gives mean rounds number (over CV folds and those 10 models) of 1020, while `early_stopping_rounds == 500` gives a mean of 3150.75. So 100x higher stopping rounds gives us on average 3x higher number of boosting rounds in the model. It's not perfect of course, but it looks that it indeed works!:) Thank you! – mirekphd Jun 29 '20 at 20:24
  • Pleased to hear this mirekphd! The "morphology" of boosting algorithms is as diverse as the range of various datasets, finding the best morphology may be time consuming and futile but finding an imperfect but useful one for the task in hand is an honourable goal!? – cousin_pete Jun 30 '20 at 02:39