1

I have inherited an application which I'm trying to improve the performance of and it currently uses mutexes (std::lock_guard<std::mutex>) to transfer data from one thread to another. One thread is a low-frequency (slow) one which simply modifies the data to be used by the other.

The other thread (which we'll call fast) has rather stringent performance requirements (it needs to do maximum number of cycles per second possible) and we believe this is being impacted by the use of the mutexes.

Basically, the current logic is:

slow thread:             fast thread:
    occasionally:            very-often:
        claim mutex              claim mutex
        change data              use data
        release mutex            release mutex

In order to get the fast thread running at maximum throughput, I'd like to experiment with removing the number of mutex locks it has to do.

I suspect a variation of the double locking check pattern may be of use here. I know it has serious issues with bi-directional data flow (or singleton creation) but the areas of responsibility in my case are a little more limited in terms of which thread performs which operations (and when) on the shared data.

Basically, the slow thread sets up the data and never reads or writes to it again unless a new change comes in. The fast thread uses and changes the data but never expects to pass any information back to the other thread. In other words, ownership mostly flows strictly one way.

I wanted to see if anyone could pick any holes in the strategy I'm thinking of.


The new idea is to have two sets of data, one current and one pending. There is no need for a queue in my case as incoming data overwrites previous data.

The pending data will only ever be written to by the slow thread under the control of the mutex and it will have an atomic flag to indicate that it has written and relinquished control (for now).

The fast thread will continue to use current data (without the mutex) until such time as the atomic flag is set. Since it is responsible for transferring pending to current, it can ensure the current data is always consistent.

At the point where the flag is set, it will lock the mutex and, transfer pending to current, clear the flag, unlock the mutex and carry on.

So, basically, the fast thread runs at full speed and only does mutex locks when it knows the pending data needs to be transferred.


Getting into more concrete details, the class will have the following data members:

std::atomic_bool m_newDataReady;
std::mutex       m_protectData;
MyStruct         m_pendingData;
MyStruct         m_currentData;

The method for receiving new data in the slow thread would be:

void NewData(const MyStruct &newData) {
    std::lock_guard<std::mutex> guard(m_protectData);
    m_newDataReady = false;
    Transfer(m_newData, 'to', m_pendingData);
    m_newDataReady = true;
}

Clearing the flag prevents the fast thread from even trying to check for new data until the immediate transfer operation is complete.

The fast thread is a little trickier, using the flag to keep mutex locks to a minimum:

while (true) {
    if (m_newDataReady) {
        std::lock_guard<std::mutex> guard(m_protectData);
        if (m_newDataReady) {
            Transfer(m_pendingData, 'to', m_currentData);
            m_newDataReady = false;
        }
    }

    Use (m_currentData);
}

Now it appears to me that the use of this method in the fast thread could improve performance quite a bit:

  • There is only one place where the atomic flag is used outside the control of the mutex and the fact that it's an atomic means its state should be consistent there.
  • Even if it's not consistent, the second check inside the mutex-locked area should provide a safety valve (it's rechecked when we know it's consistent).
  • The transfer of data is only ever performed under the control of the mutex so that should always be consistent.
  • The outer loop in the fast thread means that unnecessary mutex locks will be avoided - they'll only be done if the flag is true (or "half-true", a possibly inconsistent state).
  • The inner if will take care of that "half-true" possibility that, between checking the and locking the mutex, the flag has been cleared.

I can't see any holes in this strategy but, given I'm only just getting into atomics/threading in the standard-C++ world, it may be I'm missing something.

Are there any clear problems in using this method?

Danh
  • 5,916
  • 7
  • 30
  • 45
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • For all this, how much improvement did you actually see ? – Arunmu Nov 25 '16 at 06:02
  • If its on an intel processor then setting of the atomics by default is sequentially consistent and you wont be facing any ordering issues, but I am not sure of the default behaviour on other architectures. – Arunmu Nov 25 '16 at 06:08
  • The inner `if` be an assert. Isn't it more of an invariant check ? It should be an assert atleast on a debug build. – Arunmu Nov 25 '16 at 06:10
  • 1
    @Arunmu, the improvement is pretty good, about 8% but I don't want that if the result is non-deterministic, potentially crashing code :-) It's actually running on ARM (iMX6) rather than Intel. I don't actually want it to assert in the inner `if` (or do nothing for release builds), that's actually there to prevent attempted transfers if the slow thread has snuck in to cancel a change before the fast thread has actioned it. That's not shown in the simplified example but it will be possible in the final solution. – paxdiablo Nov 25 '16 at 06:23
  • Ok. It would be better to explicitly provide the semantics of load and store of the atomic variable as sequentially consistent. You could probably use a slightly weaker acquire-release memory order but that's an uncharted territory for me. Otherwise, the solution looks safe to me, kind of reminded me of hazard pointers :) – Arunmu Nov 25 '16 at 06:39
  • Did you try just std::atomic without mutex? As far as i see you can go fully lock-free here. – Andrey Cheboksarov Nov 25 '16 at 10:29
  • @Andrey, when you say `std::atomic` alone, I'm not sure what you're referring to. The data structure itself is almost certainly too big to be able to be done atomically which is why I've done the atomic bool and mutex. – paxdiablo Nov 25 '16 at 11:35
  • If you keep a pointer to your data structure, then it's possible. If you want to change something in a slow thread, then you can create a new data structure instance and change it(RCU techique). In a fast thread you almost always will be reading same pointer for std::atomic. I can't say that it will outperform solution with std::mutex, but it may be worth trying – Andrey Cheboksarov Nov 25 '16 at 12:11

0 Answers0