Does presorting slow down training of large decision trees?

Question

In Scikit-Learn's documentation for the DecisionTreeClassifier class, the presort hyperparameter is described like this:

presort : bool, optional (default=False)

Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.

I don't understand why presorting would slow down training on large datasets and speed up training on smaller datasets. I would expect exactly the reverse. Indeed, the documentation about decision trees's computational complexity states that without presorting, the complexity is O(n_features * n_samples^2 * log(n_samples)), but with presorting it becomes O(n_features * n_samples * log(n_samples)).

Therefore I expect presorting to take a little bit of time, which would slow down training a bit, but this would be largely compensated if the training set is large.

Is this just a mistake in Scikit-Learn's documentation or did I miss something?

Edit

I ran some tests and I found that presorting does seem to slow down training on large training sets. In fact I observe something like O(n_features * n_samples^2 * log(n_samples)), or even worse (ie. exponential), with presorting, and O(n_features * n_samples * log(n_samples)) without presorting. Training only seems to be somewhat faster with presorting when n_samples is smaller than a few thousands.

So the empirical answer is "yes", but I would love to understand why.

It appears scikit-learn deprecated this argument in version 0.22 and will be removing it in 0.24. I don't know what it was doing behind the scenes that made things slower for large datasets, but I thought I would point this out. — Will, Mar 06 '20 at 01:04

score 0 · Answer 1 · answered Jun 07 '20 at 13:50

It seems presort can speed up the finding of best splits in fitting, but the additional time will be taken to sort the training data.

With complexity that you mention, which are following:

with presort: O(n_features * n_samples^2 * log(n_samples))
without presort: O(n_features * n_samples * log(n_samples))

It is quite make sense that the complexity is multiply by n_samples, which is the best complexity that sorting algorithm can provide based on this link.

However, I am not very sure how much this presort can help on the training. I could not find any resource on how sklearn implements it.

Does presorting slow down training of large decision trees?

1 Answers1