Getting haar cascades to work

Question

I'm trying to use opencv to find traffic cones in an image.

The code uses a variety of methods to crop regions of interest out of the image and find precise bounding boxes for the cones. Something that works quite well is a segmentation by color, then a blob detection and then a nearest neighbour classifier to remove false positives.

Instead of blob detection, I would like to use Haar Cascades. This should mean that regions are selection based on color (HSV segmentation) and shape (Haar cascades).

I used this tutorial: http://docs.opencv.org/trunk/dc/d88/tutorial_traincascade.html

I'm only looking for a specific type of traffic cones. They have a known size and color. So I made a template for the shape of a traffic cone. Here is the command I used to generate the training samples, the template and one of the positive samples generated for training (the sample was generated at a higher resolution, for training I'm using 20x20 pixels):

opencv_createsamples -vec blue.vec -img singles/blue.png -bg bg.txt -num 10000 -maxidev 25 -maxxangle 0.15 -maxyangle 0.15 -maxzangle 0.15 -w 20 -h 20 -bgcolor 0 -bgthresh 1

The training was done with this command:

opencv_traincascade -data blue_data/ -vec blue.vec -bg bg.txt -numPos 1000 -numNeg 9000 -numStages 10 -numThreads 8 -featureType HAAR -w 20 -h 20 -precalcValBufSize 4096 -precalcIdxBufSize 4096

Training finishes very quickly:

===== TRAINING 3-stage =====
<BEGIN
POS count : consumed   1000 : 1000
NEG count : acceptanceRatio    9000 : 0.00248216
Precalculation time: 18
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1| 0.199556|
+----+---------+---------+
END>
Training until now has taken 0 days 0 hours 5 minutes 44 seconds.

===== TRAINING 4-stage =====
<BEGIN
POS count : consumed   1000 : 1000
NEG count : acceptanceRatio    0 : 0
Required leaf false alarm rate achieved. Branch training terminated.

Now I've applied this to a test image, the cone in the upper left corner is my template, stamped onto the image.

I don't think that the results look good. The template has been found, but there are lots of false positives. Some of the false positives look rather strange to me. For example: There is one cone on the left. It has a bright base and tip with a black stripe. In color this is yellow and black. This is neatly detected by the classifier. But how, the cone is basically the opposite of the template (I'm not using inverted colors as augmentation for the training samples).

Once the training has finished, there are 3 things you can do to tweak the performance:

Resizing the input image
Changing the scaling factor of detectMultiScale
Changing the minNeighbors parameter.

None of them work for me. In fact, the above example is almost cherrypicking. For most other settings, the detections find very many false positives or even ignore the template in the corner.

I wanted to take a closer look and cropped a region out of the image. This has been processed with:

cone_cascade.detectMultiScale(crop, scaleFactor=1.0001, minNeighbors=0)

The idea was to make sure that all scales are looked at and all matches are kept.

I don't understand these results at all.

Is there something to keep in mind when using haar cascades ? Am I doing something wrong ?

First of all, cascade of classifiers training is really not an easy task. Looks like you used only one image and artificially created your positive dataset with `opencv_createsample`. It is better to use multiple real images. The training stop at iteration 4, it means that quickly with 3 features it manages to classify successfully the pos and neg datasets. It is usually a sign that the negative set is not "hard enough" or the pos set too "similar". More information [here](http://answers.opencv.org/question/7141/about-traincascade-paremeters-samples-and-other/). — Catree, Apr 05 '17 at 15:07
Cascade of classifiers are not rotational invariant also. Keep in mind also that nowadays, object detection topics are led by DNN (Deep Neural Network) field. You should be able to train a neural network model that outperforms cascade of classifiers easily (if you know what you are doing of course). It could be worthy to invest some time on these topics. For example ([1](https://github.com/TensorBox/TensorBox), [2](https://github.com/BVLC/caffe/blob/master/examples/detection.ipynb)). Most of the DNN APIs use a Python interface afaik. — Catree, Apr 05 '17 at 15:17
Hm, I know they are not invariant wrt rotation. I added a small amount of rotation as augmentation to the sample creation. And I also know about neural networks. The reason to start with haar cascades was purely practical. We already use an opencv software stack. It seemed like the easiest way to get shape detection integrated. I completely agree with you: DNNs or rather CNNs are state of the art for this type of task. But the traffic cones are so simple, I would have expected a better performance from the haar cascades. The false seem to be so obviously wrong ... — lhk, Apr 05 '17 at 15:22
Your feedback on "hardness" of the negative samples is very interesting. I'll look into that next. Thanks — lhk, Apr 05 '17 at 15:22
not just the hardness of negative samples, you should use much more different (real) positive samples, too. Yes, it is a lot of work to gather and crop/tag the samples, but there is no free lunch in machne learning... best I know is to start with a simple classifier like yours (or just a nice color segmentation), use it in real world and auto-crop the result detections. Then check those results and collect all the false positives + true positives + find as many false negatives and true negatives as possible and add them to your next training. — Micka, Apr 05 '17 at 15:41
5 minutes is definitely too quick, with HAAR features a good training takes days. With LBP features, it is much quicker (~ one day). Few real images (or only one) and artificially generate positive images with background combination is not the way to go in my opinion. In this [paper](http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf) they used 4916 hand labeled images for the pos set and 9500 images for neg set. At the end, all the works are done by creating the datasets. **minHitRate=0.999**, **maxFalseAlarmRate=0.5**, **numStages=20** are "default" values. — Catree, Apr 05 '17 at 15:44

score 0 · Answer 1 · edited Oct 08 '17 at 19:11

Maybe it`s not actual after 4 months but maybe this could help someone else.

Required leaf false alarm rate achieved. Branch training terminated.

This message means that your detector is trained enough so no need to make more steps. However as I guess your dataset contains a very poor amount of positive images and as Catree mentioned above in comments if you want detector to make nice job you should add more of them, so detector could find your objects in different positions with different lighting.

More positive images you provide more accuracy will be achieved. When I was training cascade I had about 50 positives and the result was pretty good but not ideal.

Also before starting training do grayscaling on the images and resize them so training will be done faster on a big dataset. Size up to 100x100px should be enough. Your negatives could be any size, however, I did resize with them also but this time with 500x500px parameters.

If you proceed more positives you will have a need to compare them in one info file. You can do this manually or write a small script to make all the job as I did. Actually, you can use mine if you want, it prepares all the data for training just with several commands from the terminal. You clone it from this repo. Call help guide for it with typing python create_dataset.py -h in the terminal. I will provide readme file for convenience in the nearest time.

Getting haar cascades to work

1 Answers1