I have a problem where my application scales linearly with the number of threads(think 800 threads gives doubly the performance of 400 threads on dual core CPU). And my gut feeling is telling me that threads are sleeping or are being blocked... but I cant see it in callgrind.
So does callgrind measure function time, or just the time that thread was active to create data. If it is not clear what I ask... thread does
i ++;
for 2 seconds, then
sleep(1);//thread will not be scheduled to run for min 1 second...
... will i++
be approximately 100% or approximately 66% of the call graph.