Function takes longer time to execute if OpenCV detectMultiScale() is not called before it

Question

I have a time-critical function which takes 30ms (on average) to execute if I call OpenCV's detectMultiScale() just before. It takes 40-45ms if I don't call detectMultiScale(). The problem seems very strange as the 2 functions have nothing to do with each other. The section inside the time-critical function which dislays this behaviour is VLFeat's HOG feature descriptor vl_feat_put_image().

I need to call the function twice, once detectMultiScale() and once without, but I need the execution to complete withing 30ms both times.

void myfunc(cv::Mat& img, std:vector<cv::Rect>& faces)
{

    // other stuff

    vl_hog_put_image(...);

    // other stuff

}

void main()
{    
    cv::CascadeClassifier face_cascade;
    std::string face_cascade_name = "models/haarcascade_frontalface_alt2.xml";
    face_cascade.load(face_cascade_name);
    std:vector<cv::Rect> faces;

    cv::Mat img = cv::imread("img.jpg");


    // TYPE 1 (30ms)
    face_cascade.detectMultiScale(frame, faces, 1.2, 2, 0 | CV_HAAR_SCALE_IMAGE, cv::Size(30,30));
    myfunc(img, faces); // takes about 30ms

    //Type 2 (40ms)
    myfunc(img, faces); // takes about 40ms
}

Alternative method I tried to eliminate the variable faces as a factor.

void main()
{    
    cv::CascadeClassifier face_cascade;
    std::string face_cascade_name = "models/haarcascade_frontalface_alt2.xml";
    face_cascade.load(face_cascade_name);
    std:vector<cv::Rect> faces;

    std::ifstream myfile("facesfile.csv");
    // code to read the text file into 'faces'

    cv::Mat img = cv::imread("img.jpg");

    cv::Mat dummyImage = cv::imread("dummy.jpg");
    std::vector<cv::Rect> dummyVector;


    // TYPE 1 (30ms)
    face_cascade.detectMultiScale(dummyImage, dummyVector, 1.2, 2, 0 | CV_HAAR_SCALE_IMAGE, cv::Size(30,30));
    myfunc(img, faces); // takes about 30ms

    //Type 2 (40ms)
    myfunc(img, faces); // takes about 40ms
}

After extensive profiling, using both Visual Studio's profiler and timing calculations using std::chrono, I have found that the problem lies with the vl_hog_put_image() function but I can't figure out why.

VLFeat documentation: http://www.vlfeat.org/api/hog_8c.html#a86e1faec74ae8163db8dd1e0d292c305

I can provide the source code of the vl_hog_put_image() function if it helps as it is open source.

Any help would be greatly appreciated. Even a method to debug this kind of a problem would be useful.

The only thing I could figure out till now is that OpenCV's detectMultiScale() is parallelized internally and that is somehow affecting the system's memory allowing vl_hog_put_image() to execute faster.

CPU cache could have already have the data so it didn't have to fetch it again. — Tony J, Dec 01 '16 at 03:53
But why/how does `detectMultiScale()` affect that? I have tried other functions in its place, even a `std::this_thread::sleep_for()`. Nothing else triggers this behaviour. — Siddhanta Chakrabarty, Dec 01 '16 at 03:59
Faces is a parameter in detectmultiscale, which is then used again immediately in muting which CPU would've cached that memory — Tony J, Dec 01 '16 at 04:06
I see what you mean. I'll edit my post slightly to show another method I tried which eliminates "faces" as a factor. I saved the value of "faces" to a text file and used that. Still the same issue. — Siddhanta Chakrabarty, Dec 01 '16 at 04:12

Function takes longer time to execute if OpenCV detectMultiScale() is not called before it

0 Answers0