Say I have n
training samples and a binary classification task. I want to train a decision tree of smallest possible depth and having fewest possible total nodes such that the training accuracy on these n
samples is 100%. In the worst case, this would mean that I have one leaf node per sample. Is there some configuration of parameters in Scikit-Learn's implementation [1] of the DecisionTreeClassifier
that would let me achieve this?
Asked
Active
Viewed 301 times
0

madman_with_a_box
- 84
- 11
-
`max_depth`: The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. – ombk Nov 24 '20 at 14:13
-
if you dont set the max depth, it will develop the tree to the max – ombk Nov 24 '20 at 14:13
-
That's not really true, I think. `max_depth` sets an upper limit on the depth. But if you set (say) `max_depth` = 1000, it is not always the case that `clf.get_depth() == max_depth`. – madman_with_a_box Nov 24 '20 at 15:47
-
which one is smaller :p clf.get_depth() ? – ombk Nov 24 '20 at 15:54
-
i dont think you understand how trees work. you have an algorithm trying to split your data into baskets of pure leaves, if it reaches a point where everything is split, it stops. therefore, clf.get_depth won't be as big as the max_depth you set, it will stop once it makes the full tree, which could just use 6 depth. – ombk Nov 24 '20 at 15:58
1 Answers
1
Answer
By reading the documentation you get your answer.
If you dont set a limit to max_depth
the tree will keep expanding to the deepest leaf.
Also you can check here similar question.

ombk
- 2,036
- 1
- 4
- 16