So I was messing around with different classifiers in sklearn, and found that regardless of the value the random_state parameter GradientBoostingClassifier is in, it always returns the same values. For example, when I run the following code:
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size =0.2,
random_state=0)
scores = []
for i in range(10):
clf = GradientBoostingClassifier(random_state=i).fit(X_train, y_train)
score = clf.score(X_test,y_test)
scores = np.append(scores, score)
print scores
the output is:
[ 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667
0.66666667 0.66666667 0.66666667 0.66666667]
However, when I run the same thing with another classifier, such as RandomForest:
from sklearn.ensemble import RandomForestClassifier
scores = []
for i in range(10):
clf = RandomForestClassifier(random_state=i).fit(X_train, y_train)
score = clf.score(X_test,y_test)
scores = np.append(scores, score)
print scores
The output is what you would expect, i.e. with slight variability:
[ 0.6 0.56666667 0.63333333 0.76666667 0.6 0.63333333
0.66666667 0.56666667 0.66666667 0.53333333]
What could be causing GradientBoostingClassifier to ignore the random state? I checked the classifier info but everything seems normal:
print clf
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=0.1, loss='deviance', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, presort='auto', random_state=9,
subsample=1.0, verbose=0, warm_start=False)
I tried messing around with warm_start and presort but it didn't change anything. Any ideas? I've been trying to figure this out for almost an hour so I figured I'd ask here. Thank you for your time!