I have two chunks of code that do the same operation. One chunk written by myself, the other written by a third party. They are both compiled into a single executable. The third party code appears to be able to do its job much faster than mine. It can perform 1,500 operations per second compared to my 500. I then ran the executable within VTune, employing the callgraph profiling option, hoping this would reveal where I was wasting time. Unfortunately the VTune diagnostics, which shows the number of microseconds it thinks each function takes, claims that both my function and the third party function are taking about 0.002 seconds per call. That's appears spot on for my code but is completely at odds with my (manual) measurement of the speed of the third party code.
How can this happen?
EDIT: both chunks of code are large and call their own complex trees of sub functions.
EDIT: I should point out that the third party code is pure C++ whereas my code is essentially C code that has just been compiled in a C++ compiler.
EDIT: VTune is a very complex package with loads of configuration options I don't understand. Might there be some settings to play with that may reduce this inaccuracy?