10

I am training a model using cross validation like so:

classifier = lgb.Booster(
    params=params, 
    train_set=lgb_train_set,
)

result = lgb.cv(
    init_model=classifier,
    params=params, 
    train_set=lgb_train_set,
    num_boost_round=1000,
    early_stopping_rounds=20,
    verbose_eval=50,
    shuffle=True
)

I would like to continue training the model by running the second command multiple times (maybe with a new training set or with different parameters) and it would continue improving the model.

However, when I try this it is clear that the model is starting from scratch each time.

Is there a different approach to do what I am intending?

William Entriken
  • 37,208
  • 23
  • 149
  • 195

4 Answers4

16

Can be solved using init_model option of lightgbm.train, which accepts one of two objects

  1. a filename of LightGBM model, or
  2. a lightgbm Booster object

Code illustration:

import numpy as np 
import lightgbm as lgb

data = np.random.rand(1000, 10) # 1000 entities, each contains 10 features
label = np.random.randint(2, size=1000) # binary target
train_data = lgb.Dataset(data, label=label, free_raw_data=False)
params = {}

#Initialize with 10 iterations
gbm_init = lgb.train(params, train_data, num_boost_round = 10)
print("Initial iter# %d" %gbm_init.current_iteration())

# Example of option #1 (pass a file):
gbm_init.save_model('model.txt')
gbm = lgb.train(params, train_data, num_boost_round = 10,
                init_model='model.txt')
print("Option 1 current iter# %d" %gbm.current_iteration())


# Example of option #2 (pass a lightgbm Booster object):
gbm_2 = lgb.train(params, train_data, num_boost_round = 10,
                init_model = gbm_init)
print("Option 2 current iter# %d" %gbm_2.current_iteration())

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html

Tarek Oraby
  • 1,199
  • 6
  • 11
  • do you know if, in practice, you can continue the training process but pass new training data? I.e. perhaps a superset of the original training data? I'm trying to simulate online-learning with LightGBM. – Greg Aponte Mar 01 '23 at 00:25
  • I'm not sure of it's possible. – Tarek Oraby Mar 05 '23 at 11:51
4

to carry on training you must do lgb.train again and ensure you include in the parameters init_model='model.txt'. To confirm you have done correctly the information feedback during training should continue from lgb.cv. Then save the models best iteration like this bst.save_model('model.txt', num_iteration=bst.best_iteration).

norm
  • 319
  • 2
  • 8
3

The init_model does not work by itself. We have to set keep_training_booster param for train method:

lgb_params = {
  'keep_training_booster': True,
  'objective': 'regression',
  'verbosity': 100,
}
lgb.train(lgb_params, init_model= ...)

Or as a function parameter:

lgb.train(lgb_params, keep_training_booster=True, init_model= ...)
Moradnejad
  • 3,466
  • 2
  • 30
  • 52
  • 2
    From the documentation `keep_training_booster (bool, optional (default=False)) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning. When your model is very large and cause the memory error, you can try to set this param to True to avoid the model conversion performed during the internal call of model_to_string. You can still use _InnerPredictor as init_model for future continue training.` It does not read like it needs to be set to True. – Learning is a mess Mar 17 '21 at 17:13
1

It seems that lightgbm does not allow to pass model instance as init_model, because it takes only filename:

init_model (string or None, optional (default=None)) – Filename of LightGBM model or Booster instance used for continue training.

link

Nick
  • 173
  • 6