First of all, thanks to the people that worked on the catboost library, it's amazing!
I am researching the combination of learning continuation, posterior sampling and model shrinkage in gradient boosting. I want to see what happens when I use these techniques together for regression problems.
To note: according to the Catboost documentation, posterior sampling is a bootstrap type that enables uncertainty estimation for predictions. Model shrinkage is a technique that reduces overfitting by shrinking the weights of the trees over iterations. Learning continuation is a way to resume training from a previously saved model.
My question is: How can I implement model shrinkage with learning continuation in CatBoost? Is there some way to get this behaviour without getting the error message? Or are there alternative libraries that offer the posterior sampling and SGLB as well which I should try?
Any help or suggestions would be appreciated. Thank you.
When I try to set the parameters loss_function
= 'RMSEWithUncertainty' and posterior_sampling
= True in CatBoostRegressor and apply learning continuation, I get the following output and error message:
output:
0: learn: 3.4441362 total: 372us remaining: 372us
1: learn: 3.3704627 total: 994us remaining: 0us
Model shrinkage in combination with learning continuation is not implemented yet. Reset model_shrink_rate to 0.
error:
C:/Go_Agent/pipelines/BuildMaster/catboost.git/catboost/libs/train_lib/options_helper.cpp:386: Model shrinkage and Posterior Sampling in combination with learning continuation is not implemented yet.
Here is a code snippet that shows how I use learning continuation with CatBoost:
from catboost import CatBoostRegressor
# Initialize data
train_data = [[1, 4, 5, 6],
[4, 5, 6, 7],
[30, 40, 50, 60]]
eval_data = [[2, 4, 6, 8],
[1, 4, 50, 60]]
train_labels = [10, 20, 30]
# initial parameters
model1 = CatBoostRegressor(iterations=2,
learning_rate=0.2,
depth=2,
loss_function='RMSEWithUncertainty',
posterior_sampling=True)
# result will be in model1
model1.fit(train_data, train_labels)
# continue training with the same parameters
model2 = CatBoostRegressor(iterations=2,
learning_rate=0.2,
depth=2,
loss_function='RMSEWithUncertainty',
posterior_sampling=True)
# result will be in model2, model1 will be unchanged
model2.fit(train_data, train_labels, init_model=model1)