To do so, you'll have to dig into the detailed results of the whole grid search CV procedure; fortunately, these detailed results are returned in the cv_results_
attribute of the GridSearchCV
object (docs).
I have rerun your code as-is, but I am not retyping it here; it suffices to say that, despite explicitly setting the random number generator's seed, I am getting a different final result (I guess due to different versions) as:
{'min_samples_split': 322}
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=322,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=42, splitter='best')
but this is not important for the issue at hand here.
The easiest way to use the returned cv_results_
dictionary is to convert it to a pandas dataframe:
import pandas as pd
cv_results = pd.DataFrame.from_dict(gs.cv_results_)
Still, as it includes too much info (columns), I will further simplify it here to demonstrate the issue (feel free to explore it more fully yourself):
df = cv_results[['params', 'mean_test_PRECISION', 'rank_test_PRECISION', 'mean_test_F1', 'rank_test_F1']]
pd.set_option("display.max_rows", None, "display.max_columns", None)
pd.set_option('expand_frame_repr', False)
print(df)
Result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
0 {'min_samples_split': 2} 0.771782 1 0.763041 41
1 {'min_samples_split': 12} 0.768040 2 0.767331 38
2 {'min_samples_split': 22} 0.767196 3 0.776677 29
3 {'min_samples_split': 32} 0.760282 4 0.773634 32
4 {'min_samples_split': 42} 0.754572 8 0.777967 26
5 {'min_samples_split': 52} 0.754034 9 0.777550 27
6 {'min_samples_split': 62} 0.758131 5 0.773348 33
7 {'min_samples_split': 72} 0.756021 6 0.774301 30
8 {'min_samples_split': 82} 0.755612 7 0.768065 37
9 {'min_samples_split': 92} 0.750527 10 0.771023 34
10 {'min_samples_split': 102} 0.741016 11 0.769896 35
11 {'min_samples_split': 112} 0.740965 12 0.765353 39
12 {'min_samples_split': 122} 0.731790 13 0.763620 40
13 {'min_samples_split': 132} 0.723085 14 0.768605 36
14 {'min_samples_split': 142} 0.713345 15 0.774117 31
15 {'min_samples_split': 152} 0.712958 16 0.776721 28
16 {'min_samples_split': 162} 0.709804 17 0.778287 24
17 {'min_samples_split': 172} 0.707080 18 0.778528 22
18 {'min_samples_split': 182} 0.702621 19 0.778516 23
19 {'min_samples_split': 192} 0.697630 20 0.778103 25
20 {'min_samples_split': 202} 0.693011 21 0.781047 10
21 {'min_samples_split': 212} 0.693011 21 0.781047 10
22 {'min_samples_split': 222} 0.693011 21 0.781047 10
23 {'min_samples_split': 232} 0.692810 24 0.779705 13
24 {'min_samples_split': 242} 0.692810 24 0.779705 13
25 {'min_samples_split': 252} 0.692810 24 0.779705 13
26 {'min_samples_split': 262} 0.692810 24 0.779705 13
27 {'min_samples_split': 272} 0.692810 24 0.779705 13
28 {'min_samples_split': 282} 0.692810 24 0.779705 13
29 {'min_samples_split': 292} 0.692810 24 0.779705 13
30 {'min_samples_split': 302} 0.692810 24 0.779705 13
31 {'min_samples_split': 312} 0.692810 24 0.779705 13
32 {'min_samples_split': 322} 0.688417 33 0.782772 1
33 {'min_samples_split': 332} 0.688417 33 0.782772 1
34 {'min_samples_split': 342} 0.688417 33 0.782772 1
35 {'min_samples_split': 352} 0.688417 33 0.782772 1
36 {'min_samples_split': 362} 0.688417 33 0.782772 1
37 {'min_samples_split': 372} 0.688417 33 0.782772 1
38 {'min_samples_split': 382} 0.688417 33 0.782772 1
39 {'min_samples_split': 392} 0.688417 33 0.782772 1
40 {'min_samples_split': 402} 0.688417 33 0.782772 1
The names of the columns should be self-explanatory; they include the parameters tried, the score for each one of the metrics used, and the corresponding rank (1
meaning the best). You can immediately see, for example, that, despite the fact that 'min_samples_split': 322
gives indeed the best F1 score, it is not the only parameter setting that does so, and there are many more settings that also give the best F1 score and a respective rank_test_F1
of 1
in the results.
From this point, it is trivial to get the info you want; for example, here are the best models for each one of your two metrics:
print(df.loc[df['rank_test_PRECISION']==1]) # best precision
# result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
0 {'min_samples_split': 2} 0.771782 1 0.763041 41
print(df.loc[df['rank_test_F1']==1]) # best F1
# result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
32 {'min_samples_split': 322} 0.688417 33 0.782772 1
33 {'min_samples_split': 332} 0.688417 33 0.782772 1
34 {'min_samples_split': 342} 0.688417 33 0.782772 1
35 {'min_samples_split': 352} 0.688417 33 0.782772 1
36 {'min_samples_split': 362} 0.688417 33 0.782772 1
37 {'min_samples_split': 372} 0.688417 33 0.782772 1
38 {'min_samples_split': 382} 0.688417 33 0.782772 1
39 {'min_samples_split': 392} 0.688417 33 0.782772 1
40 {'min_samples_split': 402} 0.688417 33 0.782772 1