To elaborate on @duffymo;
Ensemble simply means "collection" so it just a collection of different models (or the same) - think of Random Forest. It is a collection of (different) Decision Trees where we then average the outputs from them to create 1 "meta" model.
I would say that boosting is an ensemble, but created in a specific way. Different boosting algorithms do it differently but what they have in common is, that they use the errors from the previous model, to create a better model in the next step. One way of creating a boosting algorithm would be:
- Fit some baseline model,
m_0
(regression could be the mean of y_train
)
- Calculate the error/residuals,
e
, for y_train
using the model M = m_0
- Fit a model (that could be a Linear Regression)
m_1
to predict e
- Create a new model as
M = m_0+m_1
- Repeat (2)-(4) as many times you want, such that your model is
M=m_0+m_1+m_2...
Why does this work?
Since the error e
is defined as e = y_train-m_0(x)
(where m_0(x)
is the predictions using m_0
) then we can train a model, m_1
to predict e
i.e we can approximate e
by m_1(x)
thus we then get
m_1(x)=y_train-m_0(x)
which then implies y_train = m_1(x)+m_0(x)
(our model in step (4)). That model is not perfect thus we can iterate over it again and again by adding a new model that fits the residual of the previous M
.
Some algorithms, like XGBoost would add a "learning rate",alpha
to each of the models, such that M = m_0 + alpha*m_1+alpha*m_2...
but that's another story