What is the concept behind the hyperparameter "splitter" in DecisionTreeClassifier by sklearn?

Question

enter image description here screenshot

I think that "splitter=random" means to find random threshold w.r.t each selected feature and then select the best threshold out of all random thresholds.
And "splitter=best" means to find best threshold w.r.t each selected feature and then select the best out of all best threhsolds.

https://datascience.stackexchange.com/q/115359/55122 – Ben Reiniger Jan 18 '23 at 15:29 — Ben Reiniger, Jan 18 '23 at 15:29

score 0 · Answer 1 · answered Feb 20 '23 at 08:52

Here I am going to give the references:

sklearn.tree.DecisionTreeClassifier uses default value splitter='best'

sklearn.tree.ExtraTreeClassifier uses default value splitter='random'

And it is clearly mentioned in sklearn.tree.ExtraTreeClassifier:

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits(thresholds) are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set to 1, this amounts to building a totally random decision tree.

Now I hope you can clearly get the understanding of hyperparameter splitter='best' or'random'.

What is the concept behind the hyperparameter "splitter" in DecisionTreeClassifier by sklearn?

1 Answers1