I have a native multithreaded Win32 application written in C++ which has about 3 relatively busy threads and 4 to 6 threads that don't do that much. When it runs in a normal mode total CPU usage adds up to about 15% on an 8-core machine and the application finished in about 30 seconds. And when I restrict the application to only one core by setting the affinity mask to 0x01
it completes faster, in 23 seconds.
I'm guessing it has something to do with the synchronization being cheaper when restricted to one physical core and/or some concurrent memory access issues.
I'm running Windows 7 x64, application is 32-bit. The CPU is Xeon X5570 with 4 cores and HT enabled.
Could anyone explain that behavior in detail? Why that happens and how to predict that kind of behavior ahead of time?
Update: I guess my question wasn't very clear. I would like to know why it gets faster on one physical core, not why it doesn't get above 15% on multiple cores.