In Scikit-Learn's documentation for the DecisionTreeClassifier
class, the presort
hyperparameter is described like this:
presort : bool, optional (default=False)
Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.
I don't understand why presorting would slow down training on large datasets and speed up training on smaller datasets. I would expect exactly the reverse. Indeed, the documentation about decision trees's computational complexity states that without presorting, the complexity is O(n_features * n_samples^2 * log(n_samples)), but with presorting it becomes O(n_features * n_samples * log(n_samples)).
Therefore I expect presorting to take a little bit of time, which would slow down training a bit, but this would be largely compensated if the training set is large.
Is this just a mistake in Scikit-Learn's documentation or did I miss something?
Edit
I ran some tests and I found that presorting does seem to slow down training on large training sets. In fact I observe something like O(n_features * n_samples^2 * log(n_samples)), or even worse (ie. exponential), with presorting, and O(n_features * n_samples * log(n_samples)) without presorting. Training only seems to be somewhat faster with presorting when n_samples is smaller than a few thousands.
So the empirical answer is "yes", but I would love to understand why.