Recently I wrote my own modified version (single threaded CPU's code) of Cascade Classifier which uses OpenCV's XML file.
I want to compare my bare VJ algorithm with OpenCV's. So I disabled OpenCL and when I run OpenCV's one it takes 19-23ms to process whole image, while my code takes 39-49ms which is 2 times slower.
I suspect it's because I have 2 cores in my CPU and they used parallel for loops to increase efficiency. Am I right?
If wrong how much impact do parallel loops in OpenCV's code have in overall performance?