I am using cv::matchTemplate to track a moving object in a video.
However, running the template matching of open cv with a small picture can be slower on a better/newer intel's CPU. The code snippet below run typically 2 times slower on a i9-7920x (0.28ms/match) than a i7-9700k (0.14ms/match).
#include <chrono>
#include <fstream>
#include <opencv2/opencv.hpp>
#pragma optimize("", off)
int main()
{
cv::Mat haystack;
cv::Mat needle;
cv::Mat result;
cv::Rect rect;
//https://en.wikipedia.org/wiki/Barack_Obama#/media/File:President_Barack_Obama.jpg
haystack = cv::imread("C:/President_Barack_Obama.jpg");
rect.width = 64;
rect.height = 64;
haystack = haystack(rect);
rect.width = 12;
rect.height = 12;
rect.x = 50;
rect.y = 50;
needle = haystack(rect);
auto start = std::chrono::high_resolution_clock::now();
int nbmatch = 10000;
for (int i = 0; i < nbmatch; i++) {
cv::matchTemplate(haystack, needle, result, cv::TemplateMatchModes::TM_CCOEFF_NORMED);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "time per match: " << (diff.count() / nbmatch) * 1000 << " ms\n";
std::this_thread::sleep_for(std::chrono::seconds(500));
}
In my real application, I noticed this:
- i7-9700k: 1ms;
- i7-6800k: 1.3ms;
- i9-7920x: 2.8ms;
- i9-9820x: 2.8ms.
Both the i9 are slower by a fair amount that could not be explained by the slight difference in clock speed.
Win 7 or 10 does not make a difference. It is compiled with Visual Studio 2019 (v142). Open CV is compiled from the source with the pre-built libraries (building it myself did not help).
Edit: The capacity to scale the frequency seems to have an important impact. If runned single threaded the i9-7920x still run in 2.8ms if I sleep regularily but if I yield instead (cpu load of 100%) it lower to 1.9ms.
Question:
What could explain this?
Do you think it is possible to bring all processor to compute in the same range of time using cv::matchTemplate?
What could I do else to reduce my computation time?