0

I have a dataset from industry and testing the classification performance using Decision Tree (DT), Random Forest (RF) and ensemble classifiers (EL) such as Bagging, Boosting, etc.

  • The issue is that I am getting fairly similar accuracy for all classifiers. Are the RF and EL are dependent on DT?
  • Is it fair to draw a performance comparsion between DT, RF and EL in academic papers?

I looked through the existing questions (1), (2) but the objectives are different from my question. Python

from sklearn.tree import DecisionTreeClassifier % Decision Tree
from sklearn.ensemble import RandomForestClassifier % Random forest
from sklearn.ensemble import AdaBoostClassifier % Ensemble learner

MATLAB

Model = fitctree(X,Y) % % Decision Tree
Model = fitensemble(X,Y,'category_encoded','Bag',100,'Tree','Type','classification'); % Random forest
Model = fitcensemble(X,Y) % Ensemble learner
Case Msee
  • 405
  • 5
  • 17
  • 2
    I don't think it is an issue if two classifier give you a similar accuracy. They don't depend on each other (three classifiers run independently). There shouldn't be a problem in the comparison of these classifiers itself, but designing a fair comparison is the author's responsibility. By the way, the title says "same" and the question says "similar"; You should fix it because "same" and "similar" are very different in this context (If it is exactly the "same" accuracy, I would doubt a bug in the program code). – Kota Mori Aug 04 '21 at 03:29
  • 2
    I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). – desertnaut Aug 04 '21 at 06:42

1 Answers1

1
  1. Yes, RF and AdaBoost are dependent on Decision Trees. Random Forests are essentially a lot of decision trees trained on some random subset of the data. During inference, all trees vote and the most popular category is chosen. AdaBoost makes use of weak learners or weak trees. In the case of AdaBoost, it is a decision tree with depth 1.

  2. Ideally, you shouldn't even try decision trees for an ML problem as Random Forests generally always outperform. You could try AdaBoost, but again the type of model is very much dependent on the dataset at hand. There are other boosting options as well. Some more details on the task at hand, and the nature of the dataset would help provide better.

In terms of just whether you can use these models in academic papers, I think you can. Random Forests and boosting are very powerful techniques, so there is no reason to avoid them if they perform well.

Ayush Goel
  • 365
  • 1
  • 7