1

I'm trying to find out how sklearn's gradient boosting classifier makes predictions from the different estimators.

I want to translate the sklearn model into base python to perform predictions. I know how to get the individual estimators from the model but I do not know how get from those individual estimator scores to the final probability predictions made by the ensembled model. I believe there is a sigmoid function or something but I can't work out what.

GBC = GradientBoostingClassifier(n_estimators=1)
GBC.fit(x_train, y_train, sample_weight=None)
GBC.predict_proba(np.array(x_test.iloc[0]).reshape(1,-1))

this returns the probabilities: array([[0.23084247, 0.76915753]]) but when I run:

Sole_estimator = GBC.estimators_[0][0]
Sole_estimator.predict(np.array(x_test.iloc[0]).reshape(1,-1)) 

which returns array([1.34327168]) applying scipy's expit to the output

expit(Sole_estimator.predict(np.array(x_test.iloc[0]).reshape(1,-1)))

I get:

array([0.79302745])

I believe the .init_ estimator contributes to the predictions but havent found out how. I would also appreciate any indication about how the predictions are made with > 1 n_estimators - if it varies.

Thanks :)

Theoaf
  • 11
  • 2

0 Answers0