max_depth
VS min_samples_leaf
The parameters max_depth
and min_samples_leaf
are confusing me the most during a multiple attempts of using GridSearchCV
. To my understanding both of these parameters are a way of controlling the depth of the trees, please correct me if I'm wrong.
max_features
I'm doing a very simple classification task and changing min_samples_leaf
seems to have no effect on the AUC score; however, tuning the depth improves my AUC from 0.79 to 0.84, pretty drastic. Nothing else seem to affect it as well. I thought the main thing I should tune is max_features
, however, best result value is not far of from sqrt(n_features)
.
scoring='roc_auc'
Another issue, I noticed if all the parameters are fixed while changing the number of trees, GridSearchCV
will always select the highest number of trees. This is understandable but the AUC slightly drops for some reason even though scoring='roc_auc'
. why is this happing? does it consider the oob_score instead.
Please feel free to share any resource that can be helpful in understanding how random forests can systematically be tuned as it seems there are few related parameters affecting each other.