0

Hi I am trying to understand lock-free work stealing dequeue implementation. Currently, I am reading one implementation by google's filament here. I am mostly concerned about the steal operation.

template <typename TYPE, size_t COUNT>
TYPE WorkStealingDequeue<TYPE, COUNT>::steal() noexcept {
    do {
        // mTop must be read before mBottom
        int32_t top = mTop.load(std::memory_order_seq_cst);

        // mBottom is written concurrently to the read below in pop() or push(), so
        // we need basic atomicity. Also makes sure that writes made in push()
        // (prior to mBottom update) are visible.
        int32_t bottom = mBottom.load(std::memory_order_acquire);

        if (top >= bottom) { 
            // queue is empty
            return TYPE();
        }

        // The queue isn't empty
        TYPE item(getItemAt(top));
        if (mTop.compare_exchange_strong(top, top + 1,
                std::memory_order_seq_cst,
                std::memory_order_relaxed)) {
            // success: we stole a job, just return it.
            return item;
        }
        // failure: the item we just tried to steal was pop()'ed under our feet,
        // simply discard it; nothing to do.
    } while (true);
}

I am wondering whether it's correct to replace the initial mtop.load memory order to memory_order_relaxed and the subsequent mBottom.load memory order to memory_order_seq_cst. This should still preserve the mtop.load and mBottom.load order right? memory_order_seq_cst flag should still prevent memory_order_relaxed to be reordered pass the the load operation right?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
kevinyu
  • 1,367
  • 10
  • 12
  • C/C++ semantics does **not** deal with "reordering". Any reasoning based on "reordering" may be applicable to practical implementations of today but not the standard. – curiousguy May 20 '19 at 03:07
  • Please post the complete MT code, that is all the parts that publish and consume data. – curiousguy May 22 '19 at 15:33

2 Answers2

0

You should reason about the code in terms of the order in which the memory operations are allowed to occur in order to maintain correctness, as well as how these orderings affect inter-thread synchronization. The C++ std::memory_order standard allows the programmer to express such constraints and leaves it up to compiler to emit the necessary memory fences to realize these constraints.

The code already expresses exactly what it needs to do: sequence mTop.load() before mBottom.load() and synchronize the mBottom.store() in push() with mBottom.load() in steal().

Eric
  • 906
  • 7
  • 13
0

Yes, I agree with your conclusion that you can relax the operations on top, but you have to make the operation on bottom sequentially consistent. In fact, that is how I have implemented my chase-lev workstealing dequeue: https://github.com/mpoeter/xenium/blob/master/xenium/chase_work_stealing_deque.hpp

mpoeter
  • 2,574
  • 1
  • 5
  • 12