3

I am using AdaBoost from scikit-learn using the typical DecisionTree weak learners. I would like to understand the runtime complexity in terms of data size N and number of weak learners T. I have searched for this info including in some of the original Adaboost paper from Yoav Freund and Robert Schapire and have not seen a very clear cut answer.

cloudyBlues
  • 75
  • 1
  • 4

2 Answers2

6

No disrespect meant to orgrisel, but his answer is lacking, as it completely ignores the number of features.

AdaBoost's time complexity is trivially O(T f), where f is the runtime of the weak learner in use.

For a normal style decision tree such as C4.5 the time complexity is O(N D^2), where D is the number of features. A single level decision tree would be O(N D)

You should never use experimentation to determine the runtime complexity of an algorithm as has been suggested. First, you will be unable to easily distinguish between similar complexities such as O(N log(N)) and O(N log(N)^2). It also risks being fooled by underlying implementation details. For example, many sorts can exhibit O(N) behavior when the data is mostly sorted or contains a few unique attributes. If you gave in an input with few unique values the runtime would exhibit faster results then the expected general case.

Raff.Edward
  • 6,404
  • 24
  • 34
  • Also the actual complexity of decision stump learning depends on the splitter, but I believe it's O(D N log N) for a single, fixed-depth tree and optimal splitting. At each candidate split, all the samples have to be sorted by the value of some feature. The sorting dominates asymptotically. – Fred Foo Mar 16 '14 at 10:47
  • You can't say sorting dominates the time complexity because its using a different variable. Sorting is but one part. The complexity os O(D N) if you use radix sort for numeric features or are using categorical features. I used general terms since there are many different types of possible decision tree induction methods. There are other ways to obtain O(D N) time without using sorting – Raff.Edward Mar 16 '14 at 21:02
  • That's all true, and I shouldn't have assumed N >> D because in some problems that's not true. But the question is about scikit-learn, which does use a classical comparison sort in its decision tree learning. – Fred Foo Mar 16 '14 at 22:08
1

It's O(N . T). The linear dependency on T is certain as the user can select the number of trees and they are trained sequentially.

I think the complexity of fitting trees in sklearn is O(N) where N is the number of samples in the training set. The number of features also has a linear impact, when max_features is left to its default value.

To make sure you can write a script that measures the training time of adaboost models for 10%, 20%, ... 100% of your data and for n_estimators=10, 20, ... 100, then plot the results with matplotlib.

Edit: as AdaBoost is generally applied to shallow trees (with max_depth between 1 and 7 in general), it might be the case that the dependency of the complexity is actually not linear N. I think I measured a linear dependency on fully developed trees in the past (e.g. as in random forests). Shallow trees might have a complexity closer to O(N . log(N)) but I am not sure.

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • And where is the analysis of **iterative** process od AdaBoost complexity itself contained here? – lejlot Mar 14 '14 at 09:37
  • 2
    AdaBoost is training one tree after the other, sequentially on the full dataset (with sample weights updated from the outcome of the previous steps). As the weights have not much impact on the training time of the trees that should not impact the linear dependency of the complexity on T. – ogrisel Mar 14 '14 at 09:46