I have an old linear model which I wish to improve using XGBoost. I have the predictions from the old model, which I wish to use as a base margin. Also, due to the nature of what I'm modeling, I need to use weights. My old glm is a poisson regression with formula number_of_defaults/exposure ~ param_1 + param_2
and weights set to exposure
(same as denominator in response variable). When training the new XGBoost model on data, I do this:
xgb_model = xgb.XGBRegressor(n_estimators=25,
max_depth=100,
max_leaves=100,
learning_rate=0.01,
n_jobs=4,
eval_metric="poisson-nloglik",
nrounds=50)
model = xgb_model.fit(X=X_train, y=y_train, sample_weight=_WEIGHT, base_margin=_BASE_MARGIN)
, where _WEIGHT
and _BASE_MARGIN
are the weights and predictions (popped out of X_train).
But how do I do cross validation or out of sample analysis when I need to specify weights and base margin?
As far as I see I can use sklearn
and GridSearchCV
, but then I would need to specify weights and base margin in XGBRegressor()
(instead of in fit()
as above). The equivalent of base_margin
in XGBRegressor()
is the argument base_score
, but there is no argument for weight.
Also, I could potentially forget about doing cross-validation, and just use a training and test dataset, and I would then use eval_set
argument in XGBRegressor()
, but if I did that there is no way of specifying what is weight and what is base margin in the different sets.
Any guidance in the right direction is much appreciated!