3

I'm implementing an application using AdaBoost to classify if an elephant is Asian or African elephant. My input data is:

Elephant size: 235  Elephant weight: 3568  Sample weight: 0.1  Elephant type: Asian
Elephant size: 321  Elephant weight: 4789  Sample weight: 0.1  Elephant type: African
Elephant size: 389  Elephant weight: 5689  Sample weight: 0.1  Elephant type: African
Elephant size: 210  Elephant weight: 2700  Sample weight: 0.1  Elephant type: Asian
Elephant size: 270  Elephant weight: 3654  Sample weight: 0.1  Elephant type: Asian
Elephant size: 289  Elephant weight: 3832  Sample weight: 0.1  Elephant type: African
Elephant size: 368  Elephant weight: 5976  Sample weight: 0.1  Elephant type: African
Elephant size: 291  Elephant weight: 4872  Sample weight: 0.1  Elephant type: Asian
Elephant size: 303  Elephant weight: 5132  Sample weight: 0.1  Elephant type: African
Elephant size: 246  Elephant weight: 2221  Sample weight: 0.1  Elephant type: African

I created a Classifier class:

import java.util.ArrayList;

public class Classifier {
private String feature;
private int treshold;
private double errorRate;
private double classifierWeight;

public void classify(Elephant elephant){
    if(feature.equals("size")){
        if(elephant.getSize()>treshold){
            elephant.setClassifiedAs(ElephantType.African);
        }
        else{
            elephant.setClassifiedAs(ElephantType.Asian);
        }           
    }
    else if(feature.equals("weight")){
        if(elephant.getWeight()>treshold){
            elephant.setClassifiedAs(ElephantType.African);
        }
        else{
            elephant.setClassifiedAs(ElephantType.Asian);
        }
    }
}

public void countErrorRate(ArrayList<Elephant> elephants){
    double misclassified = 0;
    for(int i=0;i<elephants.size();i++){
        if(elephants.get(i).getClassifiedAs().equals(elephants.get(i).getType()) == false){
            misclassified++;
        }
    }
    this.setErrorRate(misclassified/elephants.size());
}

public void countClassifierWeight(){
    this.setClassifierWeight(0.5*Math.log((1.0-errorRate)/errorRate));
}

public Classifier(String feature, int treshold){
    setFeature(feature);
    setTreshold(treshold);
}

And I trained in main() a classifier which classifies by "size" and a treshold = 250 just like this:

 main.trainAWeakClassifier("size", 250);

After my classifier classifies each elephant I count the classifier error, update weights of each sample (elephant) and count the weight of the classifier. My questions are:

How do I create the next classifier and how does it care about misclassified samples more(I know that sample weight is the key but how does it work cause I don't know how to implement it)? Did I create the first classifier properly?

Chris Laplante
  • 29,338
  • 17
  • 103
  • 134
gadzix90
  • 744
  • 2
  • 13
  • 28
  • 1
    If you are working with elephants you might want to try a strong learner instead. How do you expect a weak one to handle a 5-ton mammal? – thkala Aug 25 '12 at 20:08
  • I don't really get your point but I appreciate your sense of humor. AdaBoost is based on weak learners. – gadzix90 Aug 25 '12 at 20:45
  • 1
    Do you need to implement it from scratch or can you use libreries like Weka or R? – marc_ferna Aug 25 '12 at 21:18
  • The application has to be implemented all by my own. – gadzix90 Aug 26 '12 at 06:33
  • 1
    The `size` and `weight` should have their own thresholds as they have different scales. – Peter Lawrey Aug 26 '12 at 07:07
  • I know that and that's why my trainAWeakClassifier() method has 2 params. – gadzix90 Aug 26 '12 at 07:15
  • One point of boosting is to find out the features, so you should actually be able to tell if size or weight gives you better results for classification (from the weights) after running a few classification iterations. Basically classifying only by 'size' is the wrong approach. My tip would be, do a boosting of your data in weka to understand the process. – count0 Aug 28 '12 at 12:48

1 Answers1

0

Well, you compute the error rate and can classify the instances, but what you are missing is the update of the classifiers and combining them into one per the Ada Boost formula. Take a look at the algorithm here: Wikipedia's Ada Boost webpage

fiacobelli
  • 1,960
  • 5
  • 24
  • 31