3

In the xgb.cv function (from the library xgboost), one of the options is early_stopping_rounds. The description of this option is:

If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance doesn't improve for k rounds. Setting this parameter engages the cb.early.stop callback.

What exactly is meant by "if the performance doesn't improve for k rounds"? Is there a tolerance level tol set for this? I.e., if the difference in the performance metric between two consecutive rounds is < tol for k rounds? I want to know what the tolerance level is for xgb.cv but cannot find it anywhere in the documentation.

Otherwise, does it just mean that if it continues to decrease for k rounds?

Adrian
  • 9,229
  • 24
  • 74
  • 132

1 Answers1

0

"Performance" here means your chosen measure of the accuracy to the training results. It could be like mean square error etc.

Taking mean square error as an example here. xgb.cv trains model using cross validation method. It generally splits the data into some equal folds (by default it is 5), then trains model using 4 of them and validates the model on the rest. Repeat this process 5 times where each time it uses different combination of folds and the rest for validation. The performance is the average of 5 validation data's mean square errors.

Xgb training process takes many rounds to get a better results. But how does he know when to stop the training process to avoid overfitting (predictive power) and achieve the lowest error rate (accuracy)? Here is where early stop method kicks in.

The process is something like: training model in current round and calculating the training and validation error. If validation error is higher than the lowest validation error rate in the previous runs (rounds), counts the number of rounds to that lowest validation round. If the count is higher than the pre-set "k", then stops the training process and return the final model.

early stopping graph

Another reason to keep a reasonable "k" value, is to avoid local minimum value by test several further runs.

The tolerance you mentioned, may refer to gradient boosting process in each round, which has been briefly discussed on xgboost's website.

Sixiang.Hu
  • 1,009
  • 10
  • 21