Viola Jones AdaBoost running out of memory before even starts

Question

I'm implementing the Viola Jones algorithm for face detection. I'm having issues with the first part of the AdaBoost learning part of the algorithm.

The original paper states

The weak classiﬁer selection algorithm proceeds as follows. For each feature, the examples are sorted based on feature value.

I'm currently working with a relatively small training set of 2000 positive images and 1000 negative images. The paper describes having data sets as large as 10,000.

The main purpose of AdaBoost is to decrease the number of features in a 24x24 window, which totals 160,000+. The algorithm works on these features and selects the best ones.

The paper describes that for each feature, it calculates its value on each image, and then sorts them based on value. What this means is I need to make a container for each feature and store the values of all the samples.

My problem is my program runs out of memory after evaluating only 10,000 of the features (only 6% of them). The overall size of all the containers will end up being 160,000*3000, which is in the billions. How am I supposed to implement this algorithm without running out of memory? I've increased the heap size, and it got me from 3% to 6%, I don't think increasing it much more will work.

The paper implies that these sorted values are needed throughout the algorithm, so I can't discard them after each feature.

Here's my code so far

public static List<WeakClassifier> train(List<Image> positiveSamples, List<Image> negativeSamples, List<Feature> allFeatures, int T) {
    List<WeakClassifier> solution = new LinkedList<WeakClassifier>();

    // Initialize Weights for each sample, whether positive or negative
    float[] positiveWeights = new float[positiveSamples.size()];
    float[] negativeWeights = new float[negativeSamples.size()];

    float initialPositiveWeight = 0.5f / positiveWeights.length;
    float initialNegativeWeight = 0.5f / negativeWeights.length;

    for (int i = 0; i < positiveWeights.length; ++i) {
        positiveWeights[i] = initialPositiveWeight;
    }
    for (int i = 0; i < negativeWeights.length; ++i) {
        negativeWeights[i] = initialNegativeWeight;
    }

    // Each feature's value for each image
    List<List<FeatureValue>> featureValues = new LinkedList<List<FeatureValue>>();

    // For each feature get the values for each image, and sort them based off the value
    for (Feature feature : allFeatures) {
        List<FeatureValue> thisFeaturesValues = new LinkedList<FeatureValue>();

        int index = 0;
        for (Image positive : positiveSamples) {
            int value = positive.applyFeature(feature);
            thisFeaturesValues.add(new FeatureValue(index, value, true));
            ++index;
        }
        index = 0;
        for (Image negative : negativeSamples) {
            int value = negative.applyFeature(feature);
            thisFeaturesValues.add(new FeatureValue(index, value, false));
            ++index;
        }

        Collections.sort(thisFeaturesValues);

        // Add this feature to the list
        featureValues.add(thisFeaturesValues);
        ++currentFeature;
    }

    ... rest of code

The original paper says: "Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, **45,396**". Not 160,000. How do you get 160,000? — , Dec 07 '12 at 20:00
You also shouldn't have to store all the features explicitly. Just the values extracted for one of the features across all training patches at a time. At each step of the boosting algorithm, you can evaluate each the usefulness of each feature, choosing the best one, and adding it into your strong classifier. You never need to actually have results for all features across all images in memory at the same time. — , Dec 07 '12 at 20:02
Each of the 5 features has approximately 45,396 possible locations/sizes. The paper does mention 160,000 total features. The 2x2 feature (diagonal areas) has less possibilities, that's how it totals that. — robev, Dec 09 '12 at 01:22

score 1 · Accepted Answer · 2012-12-08T20:14:10.000

This should be the pseudocode for the selection of one of the weak classifiers:

normalize the per-example weights  // one float per example

for feature j from 1 to 45,396:
  // Training a weak classifier based on feature j.
  - Extract the feature's response from each training image (1 float per example)
  // This threshold selection and error computation is where sorting the examples
  // by feature response comes in.
  - Choose a threshold to best separate the positive from negative examples
  - Record the threshold and weighted error for this weak classifier

choose the best feature j and threshold (lowest error)

update the per-example weights

Nowhere do you need to store billions of features. Just extract the feature responses on the fly on each iteration. You're using integral images, so extraction is fast. That is the main memory bottleneck, and it's not that much, just one integer for every pixel in every image... basically the same amount of storage as your images required.

Even if you did just compute all the feature responses for all images and save them all so you don't have to do that every iteration, that still only:

45396 * 3000 * 4 bytes =~ 520 MB, or if you're convinced there are 160000 possible features,
160000 * 3000 * 4 bytes =~ 1.78 GB, or if you use 10000 training images,
160000 * 10000 * 4 bytes =~ 5.96 GB

Basically, you shouldn't be running out of memory even if you do store all the feature values.

Is this surrounded with another loop for t = 1, ... T as the paper describes? How do you update the example weights? You determine if your chosen feature classifies each example properly? When you pick a feature and then want to look for another, do you remove the previously chosen feature from the feature list? (so you don't pick it again). I ask these because I was under the impression you store all the values at the beginning and then you find the features. I guess that isn't smart due to my memory issue. I guess I'm running out of memory because I'm storing an object instead of numbers. — robev, Dec 09 '12 at 01:41
Yes, this is just the inner loop. Which is done `T` times. The formula for the updated example weights is in the paper (step 4 in their pseudocode.. I could repeat it, but perhaps that's best for a separate question). When you choose a feature (best feature at a given iteration), you add it to the strong classifier by storing the feature id, threshold, and weight (three floats). Then, move to the next iteration and find the new best feature. Again though, these seem to be questions about the algorithm, not memory usage, so perhaps best for a separate question. — , Dec 09 '12 at 01:47
If you post follow-up questions, you could paste their links here for me (and other users). — , Dec 09 '12 at 01:56

Viola Jones AdaBoost running out of memory before even starts

1 Answers1