8

The std::atomic types allow atomic access to variables, but I would sometimes like non-atomic access, for example when the access is protected by a mutex. Consider a bitfield class that allows both multi-threaded access (via insert) and single-threaded vectorized access (via operator|=):

class Bitfield
{
    const size_t size_, word_count_;
    std::atomic<size_t> * words_;
    std::mutex mutex_;

public:

    Bitfield (size_t size) :
        size_(size),
        word_count_((size + 8 * sizeof(size_t) - 1) / (8 * sizeof(size_t)))
    {
        // make sure words are 32-byte aligned
        posix_memalign(&words_, 32, word_count_ * sizeof(size_t));
        for (int i = 0; i < word_count_; ++i) {
            new(words_ + i) std::atomic<size_t>(0);
        }
    }
    ~Bitfield () { free(words_); }

private:
    void insert_one (size_t pos)
    {
        size_t mask = size_t(1) << (pos % (8 * sizeof(size_t)));
        std::atomic<size_t> * word = words_ + pos / (8 * sizeof(size_t));
        word->fetch_or(mask, std::memory_order_relaxed);
    }
public:
    void insert (const std::set<size_t> & items)
    {
        std::lock_guard<std::mutex> lock(mutex_);
        // do some sort of muti-threaded insert, with TBB or #pragma omp
        parallel_foreach(items.begin(), items.end(), insert_one);
    }

    void operator |= (const Bitfield & other)
    {
        assert(other.size_ == size_);
        std::unique_lock<std::mutex> lock1(mutex_, defer_lock);
        std::unique_lock<std::mutex> lock2(other.mutex_, defer_lock);
        std::lock(lock1, lock2); // edited to lock other_.mutex_ as well
        // allow gcc to autovectorize (256 bits at once with AVX)
        static_assert(sizeof(size_t) == sizeof(std::atomic<size_t>), "fail");
        size_t * __restrict__ words = reinterpret_cast<size_t *>(words_);
        const size_t * __restrict__ other_words
            = reinterpret_cast<const size_t *>(other.words_);
        for (size_t i = 0, end = word_count_; i < end; ++i) {
            words[i] |= other_words[i];
        }
    }
};

Note operator|= is very close to what's in my real code, but insert(std::set) is just attempting to capture the idea that one can

acquire lock;
make many atomic accesses in parallel;
release lock;

My question is this: what is the best way to mix such atomic and non-atomic access? Answers to [1,2] below suggest that casting is wrong (and I agree). But surely the standard allows such apparently safe access?

More generally, can one use a reader-writer-lock and allow "readers" to read and write atomically, and the unique "writer" to read and write non-atomically?

References

  1. How to use std::atomic efficiently
  2. Accessing atomic<int> of C++0x as non-atomic
Community
  • 1
  • 1
fritzo
  • 501
  • 5
  • 14

3 Answers3

5

Standard C++ prior to C++11 had no multithreaded memory model. I see no changes in the standard that would define the memory model for non-atomic accesses, so those get similar guarantees as in a pre-C++11 environment.

It is actually theoretically even worse than using memory_order_relaxed, because the cross thread behavior of non-atomic accesses is simply completely undefined as opposed to multiple possible orders of execution one of which must eventually happen.

So, to implement such patterns while mixing atomic and non-atomic accesses, you will still have to rely on platform specific non-standard constructs (for example, _ReadBarrier) and/or intimate knowledge of particular hardware.

A better alternative is to get familiar with the memory_order enum and hope to achieve optimum assembly output with a given piece of code and compiler. The end result may be correct, portable, and contain no unwanted memory fences, but you should expect to disassemble and analyze several buggy versions first, if you are like me; and there will still be no guarantee that the use of atomic accesses on all code paths will not result in some superfluous fences on a different architecture or a different compiler.

So the best practical answer is simplicity first. Design your cross-thread interactions as simple as you can make it without completely killing scalability, responsiveness or any other holy cow; have nearly no shared mutable data structures; and access them as rarely as you can, always atomically.

Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75
  • 2
    +1 for the last paragraph. On the other hand, the cross-thread behavior of non-atomic access, provided they are all reads, is well defined; it's only when you throw in a write or two that you get undefined behavior. That's from [intro.multithread]/21 and its predecessors. And C++ prior to C++11 **did** have a memory model, but it didn't support multi-threaded applications. – Pete Becker Sep 02 '12 at 20:31
  • @PeteBecker - Thank you. Hm, [intro.multithread]/21 speaks of "actions" which I understand to include reads as well. I read [intro.multithread]/21 as saying "one non-atomically reading thread is OK, two threads are undefined - may crash or read minus one". I agree about the non-threaded memory model, I'll correct that. – Jirka Hanika Sep 02 '12 at 21:19
  • Thanks @JirkaHanika. Both options look good: rewriting with appropriate/weaker memory_order, and understanding memory barriers. I'll try disassembling as you suggest. (any tips appreciated) – fritzo Sep 03 '12 at 04:36
  • @fritzo - I would recommend starting with `memory_order_acq_rel`. When every reader is done, and a writer is ready for execution, use acquire-release pairs to transitively lead to releasing a global. A writer who acquires the mutex, still with `memory_order_acq_rel`, should be able to do any access using `memory_order_relaxed` after this point. Likewise for transitioning back to `memory_order_acq_rel`. But I did not actually try out this approach . – Jirka Hanika Sep 03 '12 at 17:06
4

If you could do this, you'd have (potentially) one thread reading/writing a data object using atomic accesses and another thread reading/writing the same data object without using atomic accesses. That's a data race, and the behavior would be undefined.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • Doesn't the mutex prevent data race, ensuring that one access must happen before the other? (as in [intro.multithread]/21) – fritzo Sep 02 '12 at 22:36
  • 1
    Only if you lock the same mutex around every access. A mutex produces a "happens before" if one thread unlocks the mutex and another thread then locks it; that gets you synchronization. Locking a mutex doesn't affect atomic operations; the two are independent. – Pete Becker Sep 02 '12 at 22:41
  • Thanks, but I'm still confused. In a simpler case wehere the parallel_foreach in insert(std::set) were just a for loop, and I used size_t instead of std::atomic everywhere, then parallel method calls should be data-race-free, correct? In another simple case, if there were no operator|= method, then all uses of insert(std::set) would be consistent, correct? How does the composition of these two correct behaviors lead to data-race? thanks again! – fritzo Sep 02 '12 at 23:01
  • You have to look at the entire program to determine whether the program has data races. If only one function accesses data and that function is never called from more than one thread at a time, there's no data race. If that function can be called from more than one thread at a time, it has to be protected from data races with a mutex or atomics. If there are two functions that access the same data at the same time, they both have to be protected from each other, either by both using atomics or by both using a mutex. – Pete Becker Sep 02 '12 at 23:08
1

In C++20 there is std::atomic_ref, which allows atomic operations on non-atomic data.

So you should be able to declare words_ as non-atomic size_t* and use std::atomic_ref<size_t> to do atomic operations when needed. But be aware of the requirements:

While any atomic_ref instances referencing an object exists, the object must be exclusively accessed through these atomic_ref instances. No subobject of an object referenced by an atomic_ref object may be concurrently referenced by any other atomic_ref object.

upd: In this particular case you probably also need std::shared_mutex to separate atomic "reader's" modifications from non-atomic "writer's" modifications.

magras
  • 1,709
  • 21
  • 32