I'm trying to run a GridSearchCV
over a DecisionTreeClassifier
, with the only hyper-parameter being max_depth
. The two versions I ran this with are:
max_depth = range(1,20)
The best_estimator_ attribute
shows a max_depth of 15 while the scoring function shows 0.8880 on the test set
max_depth = range(1,15)
The best_estimator_ attribute
shows a max_depth of 10 with a higher score of 0.8907.
My question is, why doesn't GridSearchCV
pick a max_depth of 10 the first time around if it gives a better score?
The code is as follows:
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import make_scorer
clf = tree.DecisionTreeClassifier(random_state=7)
parameters = {"max_depth": range(1,20), "random_state":[7]}
scorer = make_scorer(fbeta_score,beta=0.5)
grid_obj = GridSearchCV(estimator=clf,param_grid=parameters,scoring=scorer)
grid_fit =grid_obj.fit(X_train,y_train)
best_clf = grid_fit.best_estimator_
predictions = (clf.fit(X_train, y_train)).predict(X_test)
best_predictions = best_clf.predict(X_test)
# Report the before-and-afterscores
print best_clf
print "\nOptimized Model\n------"
print "Final accuracy score on the testing data:
{:.4f}".format(accuracy_score(y_test, best_predictions))
print "Final F-score on the testing data: {:.4f}".format(fbeta_score(y_test,
best_predictions, beta = 0.5))