3

When profiling an application (using dotTrace), I noticed a very strange thing. I used "wall time" measurement, which should in theory mean that all threads would run for a same amount of time.

But this wasn't true: some threads (actually those I was most interested in) displayed total time about 2 times less than others. For example, profiling ran for 230 seconds, most threads report 230 seconds spent in thread, but 5 threads only show 100-110 seconds. These are not threadpool threads, and they were definitely created and started before profiling started.

What is going on here?

Update I'll add more info that may or may not be relevant. The application in question (it is a game server) has about 20-30 constantly running threads. Most threads follow simple pattern: they check an incoming queue for work, and do work if there is some. The code for thread func looks something like this:

while(true){
    if(TryDequeueWork()){ // if queue is not empty
        DoWork();         // do whatever is was on top
    }else{
        m_WaitHandle.WaitOne(MaxTimeout); // m_WaitHandle gets signaled when work is added to queue
    }
}

The threads that display weird times are like this, except they serve multiple queues, like this:

while(true){
    bool hasAnyWork=false;
    foreach(var queue in m_Queues){
        if(queue.TryDequeueWork()){
            hasAnyWork=true;
            DoWork();
        }
    }
    if(!hasAnyWork){ 
        m_WaitHandle.WaitOne(MaxTimeout); 
    }
}

The weird threads don't do any IO except maybe logging. Other, non-weird threads, do logging too. Time spent waiting for a WaitHandle is reported in profiler; actually, some of the non-weird threads spend almost all of their time waiting (as they never have any work).

The application was running on an 8-core virtual machine (VPS hosting). I don't know what physical processors are used there.

Nevermind
  • 1,521
  • 1
  • 15
  • 16
  • 1
    my wild guess for the day: the reporting is only showing user-mode time, and the threads with low numbers are spending time waiting on kernel objects. (i.e. I/O) – Massif Feb 09 '11 at 14:51
  • A colleague and I noticed some strange results like this while analyzing an ETW trace collected on a hyperthreaded CPU (Sandy Bridge). Is your CPU hyperthreaded? – Matt Feb 10 '11 at 07:03

2 Answers2

1

Did they finish before the profiler finished, perhaps?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
1

You can only get 100% wall time if

  • your machine has at least as many cores as you have threads
  • a thread does nothing but burn cpu cycles and never gets blocked by a synchronization object (like lock) or an I/O request.

Neither is very likely, few problems scale well. Amdahl's law is relevant.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Erm, yes, but what is "100% wall time" supposed to mean then? I can only guess it is 100% when the thread time equals the running time of the program. – Hans Passant Feb 09 '11 at 14:58
  • Yes' that's exactly what it's supposed to mean. The profiler ran for 230 seconds. That means that all threads (except those that started and/or finished during profiling) also ran 230 seconds. But according to dotTrace some didn't! That's what I don't understand. – Nevermind Feb 09 '11 at 15:01
  • Well, my answer is quite relevant then. I can't help you guess, you didn't even describe how many threads and how many cores you have, nor how these threads interact with each other and what kind of job they do. There's a nice concurrency analyzer built into the VS2010 Ultimate edition, it does take tooling if you don't know how these threads interact. – Hans Passant Feb 09 '11 at 15:17