8

In the lightGBM model, there are 2 parameters related to bagging

bagging_fraction
bagging_freq (frequency for bagging
              0 means disable bagging; k means perform bagging at every k 
              iteration
              Note: to enable bagging, bagging_fraction should be set to 
              value smaller than 1.0 as well)

I could find some more detailed explanation about this bagging function in gdbt. So is there anybody give me a more detailed explaination?

Kid
  • 413
  • 4
  • 11

1 Answers1

11

The code executes what documentation says- it samples a subset of training examples of the size bagging_fraction * N_train_examples. And training of the i-th tree is performed on this subset. This sampling can be done for each tree (i.e. each iteration) or after each bagging_freq trees have been trained.

For example, bagging_fraction=0.5, bagging_freq=10 means that sampling of new 0.5*N_train_examples entries will happen every 10 iterations

Mischa Lisovyi
  • 3,207
  • 18
  • 29
  • Does this mean that a lower bagging_frequency will result in less overfitting? – Tomward Matthias Dec 14 '22 at 19:24
  • 1
    When one looks at 2 extrema: `bagging_freq=1` and `bagging_freq=`, then yes, the latter option will lead to more overfitting. However, it doesn't mean that small differences in the parameter lead to a strong advantage, e.g. that the model with `bagging_freq=2` will be much more overfitted than with `bagging_freq=1`. As long as `bagging_freq << ` you should be good – Mischa Lisovyi Dec 20 '22 at 10:04