0

I have a struct of getters and setters that I use to use std::unique_lock lock to access. all the getters shared the same lock which essentially made any request to the singleton to be serialized. The for mentioned solution worked but was slow. I was also worried about dead locks especially since most of the access to my data structure is reads. I'd say 70% of all access are reads. I started looking into atomic operations b\c there was some buzz about lock free synchronization.

I've noticed a massive slow down in my code base. First let me preempt any suggestions about architecture. This code only runs on x86 so its not a memory fence issue. second I'm using acquire semantics for loads and release semantics for stores, and this is what x86 explicitly guarantees no memory fences for.

My theory is I'm not using the right memory_order for what I want to do or at least for how my library is used in practice or it has something to do with my compiler.

So I'm using MSVC 2015 update 2 for my compiler and the slow down is most prevalent on debug builds.

struct Singleton_struct
    {
        std::atomic_bool m_bIs1;
        std::atomic_bool m_bIs2;
        std::atomic_bool m_bIs3;
        Singleton_struct() : m_bIs1(false), m_bIs2(false), m_bIs3(false) {};

        void set1(bool bIs1);
        void set2(bool bIs2);
        void set3(bool bIS3);

        bool is1();
        bool is2();
        bool is3();
    }

Singleton_struct::set1(bool bIs1)
{
    m_bIs1.store(bIs1, std::memory_order_release); 
}

Singleton_struct::set2(bool bIs2)
{
    m_bIs2.store(bIs2, std::memory_order_release); 
}

Singleton_struct::set3(bool bIs3)
{
    m_bIs3.store(bIs3, std::memory_order_release); 
}

bool Singleton_struct::is1()
{
    return m_bIs1.load(std::memory_order_acquire); 
}

bool Singleton_struct::is2()
{
    return m_bIs2.load(std::memory_order_acquire); 
}

bool Singleton_struct::is3()
{
    return m_bIs3.load(std::memory_order_acquire); 
}
Matthew Fisher
  • 2,258
  • 2
  • 14
  • 23
noztol
  • 494
  • 6
  • 25
  • 1
    `the slow down is most prevalent on debug builds.` That's probably the issue. What kind of asm output is the compiler making when these functions inline into your actual code? Is it any slower for release builds? Your getters/setters look fine, so it's all a question of what you're inlining this into. (Or in a debug build, what's turning into a mess of function calls when these don't inline). – Peter Cordes Aug 23 '16 at 23:36
  • 1
    Another possible issue: If you use these three atomic booleans independently from different threads, it's probably a bad thing that they're all in the same cache line. (https://en.wikipedia.org/wiki/False_sharing). I could imagine locking might actually have helped group writes to all three booleans together, if writers usually set more than one at once. Memory-ordering mis-speculation by the CPU hardware might be more common when you're not locking; try profiling [with performance counters](https://software.intel.com/en-us/forums/intel-performance-bottleneck-analyzer/topic/327956). – Peter Cordes Aug 23 '16 at 23:44
  • 1
    Do you even need acquire and release semantics? Would `memory_order_relaxed` work, or do these flags need to indicate that a buffer of data is ready? In the code you've posted, the flags are totally independent of each other. (This would only affect compile-time reordering when targeting x86; you're right that x86 loads/stores have acquire/release semantics for free). (See the [stdatomic tag wiki](http://stackoverflow.com/tags/stdatomic/info), especially the link to Jeff Preshing's article about Acquire / Release: http://preshing.com/20120913/acquire-and-release-semantics/). – Peter Cordes Aug 23 '16 at 23:47

0 Answers0