2

I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set.

Im currently using in Open CV's CvBoost. I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used.

Using either CvBoost, a 3rd party library or implementing it myself, how can pull out a set of features from a large feature set using AdaBoot?

Community
  • 1
  • 1
Robert
  • 37,670
  • 37
  • 171
  • 213
  • This seems off-topic for several reasons: no clear question, requests for third party libraries, opinionated, and broad. – JBentley Sep 21 '14 at 18:26
  • 1
    @JBentley - Thank you for your comment. I would argue that the question is clear - how do I get AdaBoost working for feature selection. The body of the question is really to show I have done some leg work in this area. I don't want a 3rd party library recommendation, however if the solution is to use a 3rd party library then that would be great. I would welcome suggestions as to how this question can be improved, because I would really like to get some useful responses. – Robert Sep 21 '14 at 18:37

2 Answers2

3

With the help of @greeness answer I made a subclass of CvBoost

std::vector<int> RSCvBoost::getFeatureIndexes() {

    CvSeqReader reader;
    cvStartReadSeq( weak, &reader );
    cvSetSeqReaderPos( &reader, 0 );

    std::vector<int> featureIndexes;

    int weak_count = weak->total;
    for( int i = 0; i < weak_count; i++ ) {
        CvBoostTree* wtree;
        CV_READ_SEQ_ELEM( wtree, reader );

        const CvDTreeNode* node = wtree->get_root();
        CvDTreeSplit* split = node->split;
        const int index = split->condensed_idx;

        // Only add features that are not already added
        if (std::find(featureIndexes.begin(),
                      featureIndexes.end(),
                      index) == featureIndexes.end()) {

            featureIndexes.push_back(index);
        }

    }

    return featureIndexes;
}
Robert
  • 37,670
  • 37
  • 171
  • 213
  • Were you able to pull it off ? I meant, able to extract the features with this way ? – 4nonymou5 Nov 24 '14 at 19:09
  • 1
    Yes - It seemed to work. I tested it by training on a series of x, y points where the label was true if x^2+y^2 > some threshold. I added a third parameter z which was random noise and added a small amount of random noise to y but not x. The algorithm chose x then y then z which is what I would expect. – Robert Nov 24 '14 at 20:22
2

Claim: I am not a user of opencv. From the documentation, opencv's adaboost is using the decision tree (either classification tree or regression tree) as the fundamental weak learner.

It seems to me this is the way to get the underline weak learners:

CvBoost::get_weak_predictors
Returns the sequence of weak tree classifiers.

C++: CvSeq* CvBoost::get_weak_predictors()
The method returns the sequence of weak classifiers. 
Each element of the sequence is a pointer to the CvBoostTree class or 
to some of its derivatives.

Once you have access to the sequence of CvBoostTree*, you should be able to inspect which features are contained in the tree and what are the split value etc.

If each tree is only a decision stump, only one feature is contained in each weak learner. But if we allow deeper depth of tree, a combination of features could exist in each individual weak learner.

I further took a look at the CvBoostTree class; unfortunately the class itself does not provide a public method to check the internal features used. But you might want to create your own sub-class inheriting from CvBoostTree and expose whatever functionality.

greeness
  • 15,956
  • 5
  • 50
  • 80
  • This looks great, thank you! Im curious- if you are not an Open CV, do you know any other machine learning package, or did you research the answer from scratch? – Robert Sep 21 '14 at 22:23
  • np. I have experience with adaboost (written my own code) so I know the general idea. – greeness Sep 21 '14 at 22:33
  • I figured it out eventually, thanks for your help :) – Robert Sep 27 '14 at 19:50