What are some concrete real life examples which can be solved using Boosting/Bagging algorithms? Code snippets would be greatly appreciated.
3 Answers
Ensembles are used to fight overfitting / improve generalization or to fight specific weaknesses / use strength of different classifiers. They can be applied in any classification task.
I used ensembles in my masters thesis. The code is on Github.
Example 1
For example, think of a binary problem where you have to tell if a data point is of class A or B. This could be an image and you have to decide if there is a (A) a dog or (B) a cat on it. Now you have two classifiers (1) and (2) (e.g. two neural networks, but trained in different ways; or one SVM and a decision tree, or ...). They make the following errors:
(1): Predicted
T | A B
R ------------
U A | 90% 10%
E B | 50% 50%
(2): Predicted
T | A B
R ------------
U A | 60% 40%
E B | 40% 60%
You could, for example, combine them to an ensemble by first using (1). If it predicts B
, then you can use (2). Otherwise you stick with it.
Now, what would be the expected error, (falsely) assuming both are independent)?
If the true class is A
, then we predict with 90% the true result. In 10% of the cases we predict B
and use the second classifier. This one gets it right in 60% of the cases. This means if we have A
, we predict A
in 0.9 + 0.1*0.6 = 0.96 = 96%
of the cases.
If the true class is B
, we predict in 50%
of the cases B
. But we also need to get it right the second time, so only in 0.5*0.6 = 0.3 = 30%
of the cases we get it right there.
So in this simple example we made the situation better for one class, but worse for the other.
Example 2
Now, lets say we have 3 classifiers with
Predicted
T | A B
R ------------
U A | 60% 40%
E B | 40% 60%
each, but the classifications are independent. What do you get when you make a majority vote?
If you have class A, the probability that at least two say it is class A is
0.6 * 0.6 * 0.6 + 0.6 * 0.6 * 0.4 + 0.6 * 0.4 * 0.6 + 0.4 * 0.6 * 0.6
= 1*0.6^3 + 3*(0.6^2 * 0.4^1)
= (3 nCr 3) * 0.6 + (3 nCr 2) * (0.6^2 * 0.4^1)
= 0.648
The same goes for the other class. So we improved the classifier to
Predicted
T | A B
R ------------
U A | 65% 35%
E B | 35% 65%
Code
See sklearns page on Ensembles for code.
The most specific example of ensemble learning are random forests.

- 124,992
- 159
- 614
- 958
Ensemble is the art of combining diverse set of learners (individual models) together to improvise on the stability and predictive power of the model.
Ensemble Learning Techniques:
Bagging : Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. In generalized bagging, you can use different learners on different population.
Boosting : Boosting is an iterative technique which adjust the weight of an observation based on the last classification. If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa. Boosting in general decreases the bias error and builds strong predictive models.
Stacking : This is a very interesting way of combining models. Here we use a learner to combine output from different learners. This can lead to decrease in either bias or variance error depending on the combining learner we use.
for more reference: Basics of Ensemble Learning Explained
Here's Python based pseudo code for basic Ensemble Learning:
# 3 ML/DL models -> first_model, second_model, third_model
all_models = [first_model, second_model, third_model]
first_model.load_weights(first_weight_file)
second_model.load_weights(second_weight_file)
third_model.load_weights(third_weight_file)
def ensemble_average(models: List [Model]): # averaging
outputs = [model.outputs[0] for model in all_models]
y = Average()(outputs)
model = Model(model_input, y, name='ensemble_average')
pred = model.predict(x_test, batch_size = 32)
pred = numpy.argmax(pred, axis=1)
E = numpy.sum(numpy.not_equal(pred, y_test))/ y_test.shape[0]
return E
def ensemble_vote(models: List [Model]): # max-voting
pred = []
yhats = [model.predict(x_test) for model in all_models]
yhats = numpy.argmax(yhats, axis=2)
yhats = numpy.array(yhats)
for i in range(0,len(x_test)):
m = mode([yhats[0][i], yhats[1][i], yhats[2][i]])
pred = numpy.append(pred, m[0])
E = numpy.sum(numpy.not_equal(pred, y_test))/ y_test.shape[0]
return E
# Errors calculation
E1 = ensemble_average(all_models);
E2 = ensemble_vote(all_models);

- 145
- 1
- 8