-1

The gbm package in R has a function gbm.perf to find the optimum number of trees for the model using different methods like "Out-of-Bag" or "Cross-Validation" error, which helps to avoid over-fitting.

Does Gradientboosting inScikit learn library in python also have a similar function to find the optimum number of trees using the "out of bag" method ?

#r code

mod1 = gbm(var~.,data=dat, interaction.depth = 3)
best.iter = gbm.perf(mod1,method="OOB")
scores = mean(predict(mod1,x,best.iter))

#python code

modl = GradientBoostingRegressor(max_depth= 3)
modl.fit(x,y)
scores = np.mean(modl.predict(dat))

1 Answers1

1

Yes,gbm in scikit learn also have a method to find the best iterations using the oob just like in R. can refer to the below link

"in order to use oob_improvement_ in gdm the subsample should be less than 0.5"

# Fit regressor with out-of-bag estimates
params = {
"n_estimators": 1200,
"max_depth": 3,
"subsample": 0.5
}
modl = ensemble.GradientBoostingRegressor(**params)
n_estimators = params["n_estimators"]
z=np.arange(n_estimators)+1
# negative cumulative sum of oob improvements
cumsum = -np.cumsum(modl.oob_improvement_)
# min loss according to OOB
oob_best_iter = z[np.argmin(cumsum)]
print(oob_best_iter)
modl= GradientBoostingRegressor(max_depth=3
,subsample=0.5,n_estimators=oob_best_iter)
modl.fit(x,y)
  • i am able to get the desired results using the above code but i have seen this in this [link](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor) "Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias." don't know if this creates a problem! – lakshman thota Jul 29 '22 at 11:59