Combining Weak Learners into a Strong Classifier

Question

How do I combine few weak learners into a strong classifier? I know the formula, but the problem is that in every paper about AdaBoost that I've read there are only formulas without any example. I mean - I got weak learners and their weights, so I can do what the formula tells me to do (multiply learner by its weight and add another one multiplied by its weight and another one etc.) but how exactly do I do that? My weak learners are decision stumps. They got attribute and treshold, so what do I multiply?

SlimJim · Accepted Answer · 2012-09-02T07:42:06.510

If I understand your question correctly, you have a great explanation on how boosting ensambles the weak classifiers into a strong classifier with a lot of images in these lecture notes:

www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/DownloadMaterial/Lectures/Lecture8.pdf

Basically you are by taking the weighted combination of the separating hyperplanes creating a more complex decision surface (great plots showing this in the lecture notes)

Hope this helps.

EDIT

To do it practically:

in page 42 you see the formulae for alpha_t = 1/2*ln((1-e_t)/e_t) which easily can be calculated in a for loop, or if you are using some numeric library (I'm using numpy which is really great) directly by vector operations. The alpha_t is calculated inside of the adaboost so I assume you already have these.

You have the mathematical formulae at page 38, the big sigma stands for sum over all. h_t is the weak classifier function and it returns either -1 (no) or 1 (yes). alpha_t is basically how good the weak classifier is and thus how much it has to say in the final decision of the strong classifier (not very democratic).

I don't really use forloops never, but I'll be easier to understand and more language independent (this is pythonish pseudocode):

strongclassifier(x):
    response=0
    for t in T: #over all weakclassifiers indices
        response += alpha[t]*h[t](x)
    return sign(response)

This is mathematically called the dot product between the weights and the weak-responses (basically: strong(x) = alpha*weak(x)).

http://en.wikipedia.org/wiki/Dot_product

EDIT2

This is what is happening inside strongclassifier(x): Separating hyperplane is basically decided in the function weak(x), so all x's which has weak(x)=1 is on one side of the hyperplane while weak(x)=-1 is on the other side of the hyperplane. If you think about it has lines on a plane you have a plane separating the plane into two parts (always), one side is (-) and the other one is (+). If you now have 3 infinite lines in the shape of a triangle with their negative side faced outwards, you will get 3 (+)'s inside the triangle and 1 or 2 (-)'s outside which results (in the strong classifier) into a triangle region which is positive and the rest negative. It's an over simplification but the point is still there and it works totally analogous in higher dimensions.

I appreciate your help but there's still something that I don't know. I think you understood my question correctly and these lecture notes are really good but I still don't know how to implement it. Let's say I have my input data in an ArrayList (JAVA) and I trained some classifiers - how do I separate hyperplanes? Please, help me! — gadzix90, Aug 31 '12 at 20:53
I tried to break it down as much as possible, but I'm not breaking it down i Java since I have never used it for numerical problems. — SlimJim, Sep 02 '12 at 07:33
Aren't you using any numerical library in java handling matrices and stuff btw? — SlimJim, Sep 02 '12 at 07:42
No, I'm not, probably because it's my first time with this kind of stuff. I think I understand your explanation and I hope that I'll be able to implement it because I've got only 2 days left. Anyway - thank you a lot! It is possible that I'll have more questions soon... — gadzix90, Sep 02 '12 at 10:50
Hey, take a look at this paper: http://www.csie.ntu.edu.tw/~b92109/course/Machine%20Learning/AdaBoostExample.pdf It's a clear example and I get it good, but tell me please - is this: f3(x) = 0:423649 I(x < 2:5) + 0:6496 I(x < 8:5) + 0:752 I(x > 5:5); 0 mistakes final result? Or I should somehow compute it? I mean - how do I know what exactly is 0.42 from (x < 2.5) or 0.69 from (x < 8.5)? — gadzix90, Sep 02 '12 at 20:11
If I'm not mistaken this will be the final answer since you successfully classified all the training data (since you won't have any margin maximization (like in SVMs) or anything fancy like that) The 0.42 and 0.69 is the different alpha's for the weak classifiers I(x<2.5) and I(x<8). I didn't read the link to carefully though so I might be mistaken. — SlimJim, Sep 04 '12 at 07:40

score 1 · Answer 2 · answered Aug 27 '12 at 23:29

In vanilla Ada Boost, you don't multiply learners by any weight. Instead, you increase the weight of the misclassified data. Imagine that you have an array, such as [1..1000], and you want to use neural networks to estimate which numbers are primes. (Stupid example, but suffices for demonstration).

Imagine that you have class NeuralNet. You instantiate the first one, n1 = NeuralNet.new. Then you have the training set, that is, another array of primes from 1 to 1000. (You need to make up some feature set for a number, such as its digits.). Then you train n1 to recognize primes on your training set. Let's imagine that n1 is weak, so after the training period ends, it won't be able to correctly classify all numbers 1..1000 into primes and non-primes. Let's imagine that n1 incorrectly says that 27 is prime and 113 is non-prime, and makes some other mistakes. What do you do? You instantiate another NeuralNet, n2, and increase the weight of 27, 113 and other mistaken numbers, let's say, from 1 to 1.5, and decrease the weight of the correctly classified numbers from 1 to 0.667. Then you train n2. After training, you'll find that n2 corrected most of the mistakes of n1, including no. 27, but no. 113 is still misclassified. So you instantiate n3, increase the weight of 113 to 2, decrease the weight of 27 and other now correctly classified numbers to 1, and decrease the weight of old correctly classified numbers to 0.5. And so on...

Am I being concrete enough?

It's not the answer I'm looking for. You told me something that I already know. Maybe I should tell how my application looks like: My input data is an ArrayList of Elephants. Each Elephant has a size, weight and type(it can be Asian or African). My weak classifiers classify if an elephant is Asian or African by size or weight. I sort my data by elephants weight or elephants size and create all possible classifiers (which in case of decision stumps are those with treshold between each of data points). (more in next comment...) — gadzix90, Aug 28 '12 at 10:07
Then I pick the best one (which has the lowest error rate). I classify my data using that one classifier and update weights of samples (increase misclassified samples weight and decrease weights of classified correctly). Then I choose samples with greatest sample weight (misclassified ones) and I match a classifier which will classify them correctly. "And so on..." BUT I still don't know what is my final result and how do I "extract" it? — gadzix90, Aug 28 '12 at 10:08
Eg. by majority voting. The idea is, that between iterations you don't change weights that much, and you do a lot of iterations, and you do stop before you are overtrained, ie. don't try to classify in the last 1%. And then you let them vote on classification of each element. And when there is a lot of discord, you can even introduce ambiguity function, that shows how confident your system is about each elephant's classification. — Boris Stitnicky, Aug 29 '12 at 10:04

Combining Weak Learners into a Strong Classifier

2 Answers2

Linked