This is a question about the Viola-Jones Algorithm (used for face detection) as described here
http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
and in the original paper
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.4868
My questions are:
- They are describing 3 kinds of features. And the give 4 examples of the features. So so many features are they calculation per 24x24 window? 3 or 4 ? Or are they using every possible size of these 4 features? (which would be quite a lot)
- Obviously one of the features can appear in different positions of that 24x24 window. So how many times and in what exact positions?
- They are describing 3 kinds of classifiers, but obviously they can be modified a lot (like A rotated is B). Flipping or inverting classifier D would also make sense. Are they using only these 4 types or are they modifying all of them in many ways?