0

I am new to data science and so far i have learnt that bagging only reduces high variance but boosting reduces both variance and bias and thus increasing the accuracy for both train and test cases.

I understand the functioning of both. Seems like in terms of accuracy boosting always performs better than bagging. Please correct me if i am wrong.

Is there any parameter that makes bagging or bagging based algorithms better than boosting - be it in terms of memory or speed or complex data handling or any other parameter.

Vivi
  • 155
  • 14

3 Answers3

1

You're right. Both of them are good for increasing model accuracy. Infact the boosting is better than bagging in most of the cases because it learns at each stage. But, in cases where your model is overfitting, boosting will keep on overfitting it, while bagging will help in that case, because the trees are always made on a new subset of data. In short. Bagging is better than boosting in cases where you have an overfitting problem.

Vatsal Gupta
  • 471
  • 3
  • 8
1

There are two properties of bagging that can make it more attractive than boosting:

  1. It's parallelizable - You can speed up your training procedure by 4-8x times, depending on your cpu cores, due to the embarrassingly parallel nature of bagging.
  2. Bagging is comparatively more robust to noise (paper). Real life data are rarely as clean as toy datasets we play with while learning data science. Boosting have a tendency to overfit to noise, while Bagging is comparatively better at handling noise.
Shihab Shahriar Khan
  • 4,930
  • 1
  • 18
  • 26
  • thanks for the information, this is really useful and helped me learn something new. – Vivi Aug 26 '19 at 15:26
0

The goals for bagging and boosting are quite different. Bagging is an ensemble technique that tries to reduce variance so one should use it in the case of low bias but high variance, E.g. KNN with low neighbour count or Fully grown decision tree. Boosting on the other hand tries reducing the bias and hence it can handle problems of high bias but low variance, E.g. Shallow Decision Tree.