From xgboost api, iteration_range
seems to be suitable for this request, if understood the question ok:
iteration_range (Tuple[int, int]) –
Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction.
For illustration, I used California housing data to train a XGB regressor model:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X_train, X_valid, y_train, y_valid = train_test_split(housing.data, housing.target, \
test_size = 0.33, random_state = 11)
dtrain = xgb.DMatrix(data=X_train, label=y_train)
dvalid= xgb.DMatrix(data=X_valid, label=y_valid, feature_names=list(housing.feature_names))
# define model and train
params_reg = {"max_depth":4, "eta":0.3, "objective":"reg:squarederror", "subsample":1}
xgb_model_reg = xgb.train(params=params_reg, dtrain=dtrain, num_boost_round=100, \
early_stopping_rounds=20,evals=[(dtrain, "train")])
# predict
y_pred = xgb_model_reg.predict(dvalid)
The prediction for a random row 500 is 1.9630624. I used iteration_range
below to include one tree for prediction and then displayed the prediction results against each tree index:
for tree in range(0,100):
print(a,xgb_model_reg.predict(dvalid,iteration_range=(tree,tree+1))[500])
Here is the output extract:
0 0.9880972
1 0.5706124
2 0.59768033
3 0.51785016
4 0.58512527
5 0.5990092
6 0.6660166
7 0.46186835
8 0.5213114
9 0.5857907
10 0.4683379
11 0.54352343
12 0.46028078
13 0.4823497
14 0.51296484
15 0.49818778
16 0.50080884
...
97 0.5000746
98 0.49949
99 0.5004089