I'm trying to find out how sklearn's gradient boosting classifier makes predictions from the different estimators.
I want to translate the sklearn model into base python to perform predictions. I know how to get the individual estimators from the model but I do not know how get from those individual estimator scores to the final probability predictions made by the ensembled model. I believe there is a sigmoid function or something but I can't work out what.
GBC = GradientBoostingClassifier(n_estimators=1)
GBC.fit(x_train, y_train, sample_weight=None)
GBC.predict_proba(np.array(x_test.iloc[0]).reshape(1,-1))
this returns the probabilities: array([[0.23084247, 0.76915753]])
but when I run:
Sole_estimator = GBC.estimators_[0][0]
Sole_estimator.predict(np.array(x_test.iloc[0]).reshape(1,-1))
which returns array([1.34327168])
applying scipy's expit to the output
expit(Sole_estimator.predict(np.array(x_test.iloc[0]).reshape(1,-1)))
I get:
array([0.79302745])
I believe the .init_
estimator contributes to the predictions but havent found out how. I would also appreciate any indication about how the predictions are made with > 1 n_estimators - if it varies.
Thanks :)