4

I am looking for a way to optimize the following code, for an open source project that I develop, or make it more performant by moving the heavy work to another thread.

void ProfilerCommunication::AddVisitPoint(ULONG uniqueId)
{
    CScopedLock<CMutex> lock(m_mutexResults);
    m_pVisitPoints->points[m_pVisitPoints->count].UniqueId = uniqueId;
    if (++m_pVisitPoints->count == VP_BUFFER_SIZE)
    {
        SendVisitPoints();
        m_pVisitPoints->count=0;
    } 
}

The above code is used by the OpenCover profiler (an open source code coverage tool for .NET written in C++) when each visit point is called. The mutex is used to protect some shared memory (a 64K block shared between several processes 32/64 bit and C++/C#) when full it signals the host process. Obviously this is quite heavy for each instrumentation point and I'd like to make the impact lighter.

I am thinking of using a queue which is pushed to by the above method and a thread to pop the data and populate the shared memory.

Q. Is there a thread-safe queue in C++ (Windows STL) that I can use - or a lock-less queue as I wouldn't want to replace one issue with another? Do people consider my approach sensible?


EDIT 1: I have just found concurrent_queue.h in the include folder - could this be my answer...?

Shaun Wilde
  • 8,228
  • 4
  • 36
  • 56
  • There is no such thing as tread safe queue whit out locking if you have multiple writers and multiple readers from queue. But if there is single reader and single writer it can be simply done tread safe and most implementations hold this rule. – Luka Rahne Aug 30 '11 at 12:31
  • Check out [Herb Sutter's implementation of a waitfree queue](http://drdobbs.com/cpp/212201163) (you need to make a free account to access the article); it uses atomic variables. – Kerrek SB Aug 30 '11 at 12:50
  • Since you found concurrent_queue, are you on VS2010? – RedX Aug 30 '11 at 13:16
  • I am yes - just implemented it and stress testing to see what benefit it gives – Shaun Wilde Aug 30 '11 at 13:20

4 Answers4

1

Okay I'll add my own answer - concurrent_queue works very well

using the details described in this MSDN article I implemented concurrent queue (and tasks and my first C++ lambda expression :) ) I didn't spend long thinking though as it is a spike.

inline void AddVisitPoint(ULONG uniqueId) { m_queue.push(uniqueId); }

...
// somewhere else in code

m_tasks.run([this]
{
    ULONG id;
    while(true)
    {
         while (!m_queue.try_pop(id)) 
            Concurrency::Context::Yield();

        if (id==0) break; // 0 is an unused number so is used to close the thread/task
        CScopedLock<CMutex> lock(m_mutexResults);
        m_pVisitPoints->points[m_pVisitPoints->count].UniqueId = id;
        if (++m_pVisitPoints->count == VP_BUFFER_SIZE)
        {
            SendVisitPoints();
            m_pVisitPoints->count=0;
        }
    }
});

Results:

  • Application without instrumentation = 9.3
  • Application with old instrumentation handler = 38.6
  • Application with new instrumentation handler = 16.2
Shaun Wilde
  • 8,228
  • 4
  • 36
  • 56
0

Here it mentions not all container operations are thread safe on Windows. Only a limited number of methods. And I don't believe C++ standards mention about threadsafe containers. I maybe wrong, but checked the standards nothing came up

DumbCoder
  • 5,696
  • 3
  • 29
  • 40
  • I've just been looking at http://blogs.msdn.com/b/nativeconcurrency/archive/2009/11/23/the-concurrent-queue-container-in-vs2010.aspx as well - which is funny as I've been using a similarly named class in C# – Shaun Wilde Aug 30 '11 at 12:29
0

Would it be possible to offload the client's communication into a separate thread? Then the inspection points can use thread local storage to record their hits and only need to communicate with a local thread to pass off a reference when full. The communication thread can then take its time to pass on the data to the actual collector since it's not on the hot path anymore.

David Schmitt
  • 58,259
  • 26
  • 121
  • 165
  • that is the plan - it's just getting the data from the call to the shared memory - I am using shared memory as the run-time can terminate the profiler at any time without warning so I want to capture as much data as I can and the host can read it. – Shaun Wilde Aug 30 '11 at 12:49
0

You could use a lock free queue. Herb Sutter has some articles here.

duedl0r
  • 9,289
  • 3
  • 30
  • 45