How to Overfit a Decision Tree in scikit-learn on purpose?

Question

Say I have n training samples and a binary classification task. I want to train a decision tree of smallest possible depth and having fewest possible total nodes such that the training accuracy on these n samples is 100%. In the worst case, this would mean that I have one leaf node per sample. Is there some configuration of parameters in Scikit-Learn's implementation [1] of the DecisionTreeClassifier that would let me achieve this?

[1] https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn-tree-decisiontreeclassifier

`max_depth`: The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. — ombk, Nov 24 '20 at 14:13
if you dont set the max depth, it will develop the tree to the max — ombk, Nov 24 '20 at 14:13
That's not really true, I think. `max_depth` sets an upper limit on the depth. But if you set (say) `max_depth` = 1000, it is not always the case that `clf.get_depth() == max_depth`. — madman_with_a_box, Nov 24 '20 at 15:47
i dont think you understand how trees work. you have an algorithm trying to split your data into baskets of pure leaves, if it reaches a point where everything is split, it stops. therefore, clf.get_depth won't be as big as the max_depth you set, it will stop once it makes the full tree, which could just use 6 depth. — ombk, Nov 24 '20 at 15:58

score 1 · Answer 1 · answered Nov 24 '20 at 14:15

1

Answer

By reading the documentation you get your answer.

If you dont set a limit to max_depth the tree will keep expanding to the deepest leaf.

Also you can check here similar question.

answered Nov 24 '20 at 14:15

ombk

2,036
1
4
16

How to Overfit a Decision Tree in scikit-learn on purpose?

1 Answers1