I have a time-critical function which takes 30ms (on average) to execute if I call OpenCV's detectMultiScale() just before. It takes 40-45ms if I don't call detectMultiScale(). The problem seems very strange as the 2 functions have nothing to do with each other. The section inside the time-critical function which dislays this behaviour is VLFeat's HOG feature descriptor vl_feat_put_image().
I need to call the function twice, once detectMultiScale() and once without, but I need the execution to complete withing 30ms both times.
void myfunc(cv::Mat& img, std:vector<cv::Rect>& faces)
{
// other stuff
vl_hog_put_image(...);
// other stuff
}
void main()
{
cv::CascadeClassifier face_cascade;
std::string face_cascade_name = "models/haarcascade_frontalface_alt2.xml";
face_cascade.load(face_cascade_name);
std:vector<cv::Rect> faces;
cv::Mat img = cv::imread("img.jpg");
// TYPE 1 (30ms)
face_cascade.detectMultiScale(frame, faces, 1.2, 2, 0 | CV_HAAR_SCALE_IMAGE, cv::Size(30,30));
myfunc(img, faces); // takes about 30ms
//Type 2 (40ms)
myfunc(img, faces); // takes about 40ms
}
Alternative method I tried to eliminate the variable faces
as a factor.
void main()
{
cv::CascadeClassifier face_cascade;
std::string face_cascade_name = "models/haarcascade_frontalface_alt2.xml";
face_cascade.load(face_cascade_name);
std:vector<cv::Rect> faces;
std::ifstream myfile("facesfile.csv");
// code to read the text file into 'faces'
cv::Mat img = cv::imread("img.jpg");
cv::Mat dummyImage = cv::imread("dummy.jpg");
std::vector<cv::Rect> dummyVector;
// TYPE 1 (30ms)
face_cascade.detectMultiScale(dummyImage, dummyVector, 1.2, 2, 0 | CV_HAAR_SCALE_IMAGE, cv::Size(30,30));
myfunc(img, faces); // takes about 30ms
//Type 2 (40ms)
myfunc(img, faces); // takes about 40ms
}
After extensive profiling, using both Visual Studio's profiler and timing calculations using std::chrono, I have found that the problem lies with the vl_hog_put_image() function but I can't figure out why.
VLFeat documentation: http://www.vlfeat.org/api/hog_8c.html#a86e1faec74ae8163db8dd1e0d292c305
I can provide the source code of the vl_hog_put_image() function if it helps as it is open source.
Any help would be greatly appreciated. Even a method to debug this kind of a problem would be useful.
The only thing I could figure out till now is that OpenCV's detectMultiScale() is parallelized internally and that is somehow affecting the system's memory allowing vl_hog_put_image() to execute faster.