3

I wanted to compare adaboost and decision trees. As a proof of principle, I set the number of estimators in adaboost to 1 with a decision tree classifier as a default, expecting the same result as a simple decision tree.

I indeed got the same accuracy in predicting my test labels. However, the fitting time is much lower for adaboost, while the testing time is a bit higher. Adaboost seems to be using the same default settings as DecisionTreeClassifier, otherwise, the accuracy wouldn't be exactly the same.

Can anyone explain this?

Code

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score   

print("creating classifier")
clf = AdaBoostClassifier(n_estimators = 1)
clf2 = DecisionTreeClassifier()

print("starting to fit")

time0 = time()
clf.fit(features_train,labels_train) #fit adaboost
fitting_time = time() - time0
print("time for fitting adaboost was", fitting_time)

time0 = time()
clf2.fit(features_train,labels_train) #fit dtree
fitting_time = time() - time0
print("time for fitting dtree was", fitting_time)

time1 = time()
pred = clf.predict(features_test) #test adaboost
test_time = time() - time1
print("time for testing adaboost was", test_time)

time1 = time()
pred = clf2.predict(features_test) #test dtree
test_time = time() - time1
print("time for testing dtree was", test_time)

accuracy_ada = accuracy_score(pred, labels_test) #acc ada
print("accuracy for adaboost is", accuracy_ada)

accuracy_dt = accuracy_score(pred, labels_test) #acc dtree
print("accuracy for dtree is", accuracy_dt)

Output

('time for fitting adaboost was', 3.8290421962738037)
('time for fitting dtree was', 85.19442415237427)
('time for testing adaboost was', 0.1834099292755127)
('time for testing dtree was', 0.056527137756347656)
('accuracy for adaboost is', 0.99089874857792948)
('accuracy for dtree is', 0.99089874857792948)
Jason
  • 2,278
  • 2
  • 17
  • 25
galliwuzz
  • 369
  • 4
  • 14
  • 1
    What is the dimension of `features_train`? when I repeat your experiment with 100 3-dimensional samples, the decision tree is about 10 times faster then Adaboost. – Itamar Katz Nov 12 '16 at 20:14
  • 1
    Also, try to use a profiler. IPython's magic `%prun` is a good option. – Itamar Katz Nov 12 '16 at 20:26
  • Features_train has 16000 features of 3785 samples. I am interested in what the conceptual difference between those two would be. In what way is the algorithm they employ different? I would expect AdaBoostClassifier with 1 estimator to do exactly what DecisionTreeClassifier does. – galliwuzz Nov 12 '16 at 21:05

2 Answers2

2

I tried to repeat your experiment in IPython, but I don't see such a big difference:

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
import numpy as np
x = np.random.randn(3785,16000)
y = (x[:,0]>0.).astype(np.float)    
clf = AdaBoostClassifier(n_estimators = 1)
clf2 = DecisionTreeClassifier()
%timeit clf.fit(x,y)
1 loop, best of 3: 5.56 s per loop
%timeit clf2.fit(x,y)
1 loop, best of 3: 5.51 s per loop

Try to use a profiler, or first repeat the experiment.

Itamar Katz
  • 9,544
  • 5
  • 42
  • 74
1

The two classifiers you defined in the following lines:

clf = AdaBoostClassifier(n_estimators = 1)
clf2 = DecisionTreeClassifier()

are actually defining very different classifiers. In the first case (clf) you are defining a single (n_estimators = 1), max_depth=1 decision tree. This is explained in the documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

where it explains:

"the base estimator is DecisionTreeClassifier(max_depth=1)"

For the second case (clf2) you are defining a decision tree with max_depth that is determined by the number needed to make all leaves pure. Again, you can find this out by reading the docs:

https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

The moral of the story is: read the documentation!

Joshua T
  • 656
  • 8
  • 15