2

I have a problem where my application scales linearly with the number of threads(think 800 threads gives doubly the performance of 400 threads on dual core CPU). And my gut feeling is telling me that threads are sleeping or are being blocked... but I cant see it in callgrind.

So does callgrind measure function time, or just the time that thread was active to create data. If it is not clear what I ask... thread does

i ++;

for 2 seconds, then

sleep(1);//thread will not be scheduled to run for min 1 second...  

... will i++ be approximately 100% or approximately 66% of the call graph.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • 1
    After fixing some grammar and markup, I still do not really understand your question. – Sebastian Mach Dec 20 '11 at 15:20
  • It will be approximately 100% (but why don't you try it and see?) – n. m. could be an AI Dec 20 '11 at 15:39
  • Eg .During 45 seconds of execution time thread X was running for 22 second... Does Valgrind when it calculates call graph does calculations on those 22 second or entire 45 seconds... Another eg. if thread blocks on network rcv is that waiting part of the time that is reported... ? – NoSenseEtAl Dec 20 '11 at 15:39
  • @n.m is there a way to get data with sleeping and blocking measured ? I didnt try it becuase it is not trivial to measure if call to blocking function is being registered as time spent..., sleep was just an example – NoSenseEtAl Dec 20 '11 at 15:41
  • You can use `--collect-systime=yes` option, this will give you syscall times. But this is entirely separate from the default profiling data. I.ee. you can see a graph with ~100% taken by `i++` **or** a graph with ~100% taken by `sleep()`. I think you cannot combine the two. – n. m. could be an AI Dec 20 '11 at 16:05
  • 1
    Not sure I understand the question, in general call grind measure CPU instructions, not time, which would be heavily distorted by valgrind's slowness anyway. If you want to measure realistic times, better use a sampling profiler like sysprof. (On a side note, I don't see how 400 threads on a dual core can do anything but sleep most of the time) – Frank Osterfeld Dec 20 '11 at 16:58
  • @Frank but that makes no sense... isnt waiting on mutex like just a small number of instructions... and it can take more than 5000 instructions... also in plenty of runs statistically slowness doesnt matter that much(prob is that ull get data "close" to real data). – NoSenseEtAl Dec 20 '11 at 17:11
  • Are you saying that your program takes twice as long to run with 800 threads as compared to 400? That makes perfect sense, as there's more thread overhead but no additional parallelization possible. – Mark B Dec 20 '11 at 18:01
  • @Mark, no I get twice the work per second... for example 400->10, 800->20 – NoSenseEtAl Dec 20 '11 at 18:03
  • Oh my so embarrasing, the reason was sleep(), but in the code that wasnt in the folder I was grepping for sleep... Still question remains... how to get time spent in function, not just the number of CPU cycles used during function execution – NoSenseEtAl Dec 21 '11 at 12:40

1 Answers1

1

Valgrind collects user time statistics, not real time statistics. So if threads are interrupting each other, you won't see it in Valgrind. All you will get is the actual time spent executing each function.

Jarryd
  • 1,312
  • 11
  • 17