How to train OpenCV SVM with BoW Properly

Question

I can't train the SVM to recognize my object. I'm trying to do this using SURF + Bag Of Words + SVM. My problem is that the classifier does not detect anything. All the results are 0.

Here is my code:

Ptr<FeatureDetector> detector = FeatureDetector::create("SURF");
Ptr<DescriptorExtractor> descriptors = DescriptorExtractor::create("SURF");

string to_string(const int val) {
    int i = val;
    std::string s;
    std::stringstream out;
    out << i;
    s = out.str();
    return s;
}

Mat compute_features(Mat image) {
    vector<KeyPoint> keypoints;
    Mat features;

    detector->detect(image, keypoints);
    KeyPointsFilter::retainBest(keypoints, 1500);
    descriptors->compute(image, keypoints, features);

    return features;
}

BOWKMeansTrainer addFeaturesToBOWKMeansTrainer(String dir, BOWKMeansTrainer& bowTrainer) {
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    dp = opendir(dir.c_str());


    Mat features;
    Mat img;

    string filepath;
    #pragma loop(hint_parallel(4))
    for (; (dirp = readdir(dp));) {
        filepath = dir + dirp->d_name;

        cout << "Reading... " << filepath << endl;

        if (stat( filepath.c_str(), &filestat )) continue;
        if (S_ISDIR( filestat.st_mode ))         continue;

        img = imread(filepath, 0);

        features = compute_features(img);
        bowTrainer.add(features);
    }


    return bowTrainer;
}

void computeFeaturesWithBow(string dir, Mat& trainingData, Mat& labels, BOWImgDescriptorExtractor& bowDE, int label) {
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    dp = opendir(dir.c_str());

    vector<KeyPoint> keypoints;
    Mat features;
    Mat img;

    string filepath;

    #pragma loop(hint_parallel(4))
    for (;(dirp = readdir(dp));) {
        filepath = dir + dirp->d_name;

        cout << "Reading: " << filepath << endl;

        if (stat( filepath.c_str(), &filestat )) continue;
        if (S_ISDIR( filestat.st_mode ))         continue;

        img = imread(filepath, 0);

        detector->detect(img, keypoints);
        bowDE.compute(img, keypoints, features);

        trainingData.push_back(features);
        labels.push_back((float) label);
    }

    cout << string( 100, '\n' );
}

int main() {
    initModule_nonfree();

    Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");

    TermCriteria tc(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 10, 0.001);
    int dictionarySize = 1000;
    int retries = 1;
    int flags = KMEANS_PP_CENTERS;
    BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
    BOWImgDescriptorExtractor bowDE(descriptors, matcher);

    string dir = "./positive_large", filepath;
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    cout << "Add Features to KMeans" << endl;
    addFeaturesToBOWKMeansTrainer("./positive_large/", bowTrainer);
    addFeaturesToBOWKMeansTrainer("./negative_large/", bowTrainer);

    cout << endl << "Clustering..." << endl;

    Mat dictionary = bowTrainer.cluster();
    bowDE.setVocabulary(dictionary);

    Mat labels(0, 1, CV_32FC1);
    Mat trainingData(0, dictionarySize, CV_32FC1);


    cout << endl << "Extract bow features" << endl;

    computeFeaturesWithBow("./positive_large/", trainingData, labels, bowDE, 1);
    computeFeaturesWithBow("./negative_large/", trainingData, labels, bowDE, 0);

    CvSVMParams params;
    params.kernel_type=CvSVM::RBF;
    params.svm_type=CvSVM::C_SVC;
    params.gamma=0.50625000000000009;
    params.C=312.50000000000000;
    params.term_crit=cvTermCriteria(CV_TERMCRIT_ITER,100,0.000001);
    CvSVM svm;

    cout << endl << "Begin training" << endl;

    bool res=svm.train(trainingData,labels,cv::Mat(),cv::Mat(),params);

    svm.save("classifier.xml");

    //CvSVM svm;
    svm.load("classifier.xml");

    VideoCapture cap(0); // open the default camera

    if(!cap.isOpened())  // check if we succeeded
        return -1;

    Mat featuresFromCam, grey;
    vector<KeyPoint> cameraKeyPoints;
    namedWindow("edges",1);
    for(;;)
    {
        Mat frame;
        cap >> frame; // get a new frame from camera
        cvtColor(frame, grey, CV_BGR2GRAY);
        detector->detect(grey, cameraKeyPoints);
        bowDE.compute(grey, cameraKeyPoints, featuresFromCam);

        cout << svm.predict(featuresFromCam) << endl;
        imshow("edges", frame);
        if(waitKey(30) >= 0) break;
    }   

        return 0;
}

You should know that I got the parameters from an existing project with a good results, so I thought they'll be useful in my code too (but eventually maybe not).

I have 310 positive images and 508 negative images. I tried to use equal numbers of positive and negative images but the result is the same. The object I want to detect is car steering wheel. Here is my dataset.

Do you have any idea what I'm doing wrong? Thank you!

score 7 · Accepted Answer · answered Jan 23 '15 at 14:55

7

First of all, using same parameters from an existing project doesn't prove that you are using correct parameters. In fact, in my opinion it is a completely nonsense approach (no offense). It is because, SVM parameters are affected from dataset and decriptor extraction method directly. In order to get correct parameters you have to do cross-validation. So if those parameters are obtained from a different recognition task it won't make any sense. For example in my face verification project optimal parameters were 0.0625 and 10 for gamma and C respectively.

Other important issue with your approach is test images. As far as I see from your code, you are not using images from disk to test your classifier, so from rest of here I'll do some assumptions. If your test images, that you are acquired from camera are different from your positive images, it will fail. By different I mean this; you have to be sure that your test images are composed only of steering wheels, because your training images contain only steering wheels. If your test image contains, for instance car seat with it, your BoW descriptor for test image will be completely different from your train images BoW descriptor. So, simply, your test images shouldn't contain steering wheels with some other objects, they should only contain steering wheels.

If you satisfy these, using training images for testing your system is the most basic approach. Even in that scenario you are failing you probably have some implenetation issues. Other approach can be this; split your training data into two, such that you have four partitions:

Positive train images
Negative train images
Positive test images
Negative test images

Use only train images for training the system and test it with the test images. And again, you have to specify parameters via cross-validation.

Other than these, you might want to check some specific steps in order to localize the problem, before doing the previous things that I wrote:

How many keypoints are detected for each image? Similar images should result in similar number of keypoints.
You know that BoW descriptor is a histogram of the SURF descriptors of an image. Be sure that similar images result in similar histograms (BoW descriptors). It is better for you to check this by visualizing the histograms.
If the previous step is satisfied, the problem is most probably with the SVM training step, which is a very important step (maybe the most important one).

I hope I was able to emphasize the importance of the cross-validation. Do the cross-validation!

Good luck!

answered Jan 23 '15 at 14:55

guneykayim

5,210
2
29
61

Thank you for your detailed answer!!! I need to ask you something. So lets say that my raining set of steering wheels is inly steering wheels with white background. You are telling me that if my test images are with background (the speedometer, seats etc) it wont work? – dephinera Jan 23 '15 at 15:52
Actually I didn't mean that, however it might also be the reason. It depends on the amount of descriptors found on the image other than the steering wheel. What I meant is this; if your test image is an steering wheel, a dog and a bicycle (all in one image), it won't work. What I'm saying is it is not only about the background, it is al about other parts around the steering wheel. – guneykayim Jan 23 '15 at 16:00
Okay, to get it clear enough. Is this a good test image: http://upload.wikimedia.org/wikipedia/commons/d/dc/Volvo_steering_wheel.jpg ? – dephinera Jan 23 '15 at 16:06
Not perfect, there is a chance that it might work, but it will most probably fail. Because, SURF will find a lot more keypoints in this image with compared to the images with white background. So, your BoW descriptors will be different. Try testing with training images. – guneykayim Jan 23 '15 at 16:11
So this means that the SVM wont be useful in a real situation where the steering wheel has a background with speedometer and other stuff? – dephinera Jan 23 '15 at 16:20
No, it is not about SVM. Your training images should be similar with your test images, that's the problem. You can't expect to get correct results when your train and test sets are not similar. Or, you have to use another descriptor than BoW which will be invariant to this kind of dissimilarities. – guneykayim Jan 23 '15 at 16:26
I think I'm starting to get it. But when the test images are similar to the train images, what will happen when the SVM is trained and I start to use it on an images with the object in it's natural place. Steering wheels are expected to be in a car wth other things behind it, not with background as it is in my dataset. – dephinera Jan 23 '15 at 17:47
Then your training images should contain objects in their natural place. They should have other things behind it, not with white background as it is in your dataset. Your dataset is not proper for your approach. Either you have to change your dataset or your approach (SURF + BoW). – guneykayim Jan 23 '15 at 21:04
By the way, your problem might be solved just by optimizing the SVM parameters. But even in that case, selecting training images from their natural place will increase the performance. – guneykayim Jan 23 '15 at 21:27
Hi again. Sorry I keep asking.. I'd like to ask you what could be the reason for my svm to predict every circle as a steering wheel? – dephinera Jan 30 '15 at 18:27
No worries.. You might need to add variaty of circular objects (not steering wheels) to the negative train set? With this you'll cause svm not to select circularity as a discriminative property, because both positive and negative train sets will contain circular objects. Remember, if you apply this kind of change, you need to do cross validation and parameter optimization again. Using previous parameters would result erroneous decisions. – guneykayim Jan 31 '15 at 04:08

How to train OpenCV SVM with BoW Properly

1 Answers1

Linked