3

Here's a fine-grained locking queue introduced by Anthony Williams in chapter 6.2.3 C++ Concurrency in Action.

/*
    pop only need lock head_mutex and a small section of tail_mutex,push only need
    tail_mutex mutex.maximum container concurrency.
*/
template<typename T> class threadsafe_queue
{
    private:
    struct node
    {
        std::shared_ptr<T> data;
        std::unique_ptr<node> next;
    }
    std::mutex head_mutex;   //when change the head lock it.
    std::unique_ptr<node> head;  
    std::mutex tail_mutex;   //when change the tail lock it.
    node* tail;
    std::condition_variable data_cond;

    node* get_tail()
    {
        std::lock_guard<std::mutex> tail_lock(tail_mutex);
        return tail;
    }

    public:
    /* 
        create a dummy node
    */
    threadsafe_queue():
        head(new node),tail(head.get())
    {}

    std::shared_ptr<T> wait_and_pop()
    {
        std::unique_lock<std::mutex> head_lock;
        data_cond.wait(head_lock,[&]{return head.get()!=get_tail();}); //#1
        std::unique_ptr<node> old_head=std::move(head);
        head=std::move(old_head->next);
        return old_head;
    }

    void push(T new_value)
    {
        std::shared_ptr<T> new_data(
        std::make_shared<T>(std::move(new_value)));
        std::unique_ptr<node> p(new node);
        {
            std::lock_guard<std::mutex> tail_lock(tail_mutex);
            tail->data=new_data;
            node* const new_tail=p.get();
            tail->next=std::move(p);
            tail=new_tail;
        }
        data_cond.notify_one();
    }
}

Here's the situation: There are two threads (thread1 and thread2). thread1 is doing a wait_and_pop and thread2 is doing a push. The queue is empty.

thread1 is in #2, had already checked head.get()!=get_tail() before data_cond.wait(). At this time its CPU period had run out. thread2 begins.

thread2 finished the push function and did data_cond.notify_one(). thread1 begins again.

Now thread1 begins data_cond.wait(), but it waits forever.

Would this situation possibly happen ?If so, how to get this container fixed ?

Beryllium
  • 12,808
  • 10
  • 56
  • 86
JoeyMiu
  • 139
  • 10

2 Answers2

7

Yes, the situation described in the OP is possible and will result in notifications being lost. Injecting a nice big time delay in the predicate function makes it easy to trigger. Here's a demonstration at Coliru. Notice how the program takes 10 seconds to complete (length of the timeout to wait_for) instead of 100 milliseconds (time when the producer inserts an item in the queue). The notification is lost.

There is an assumption implicit in the design of condition variables that the state of the condition (return value of the predicate) cannot change while the associated mutex is locked. This is not true for this queue implementation since push can change the "emptiness" of the queue without holding head_mutex.

§30.5p3 specifies that wait has three atomic parts:

  1. the release of the mutex, and entry into the waiting state;
  2. the unblocking of the wait; and
  3. the reacquisition of the lock.

Note that none of these mention checking of the predicate, if any was passed to wait. The behavior of wait with a predicate is described in §30.5.1p15:

Effects:

while (!pred())
      wait(lock);

Note that there is no guarantee here either that the predicate check and the wait are performed atomically. There is a pre-condition that lock is locked and it's associated mutex held by the calling thread.

As far as fixing the container to avoid loss of notifications, I would change it to a single mutex implementation and be done with it. It's a bit of a stretch to call it fine-grained locking when the push and pop both end up taking the same mutex (tail_mutex) anyway.

Casey
  • 41,449
  • 7
  • 95
  • 125
0

data_cond.wait() checks the condition every time it is woken up. So even though it may have already been checked, it will be checked again after data_cond.notify_one(). At that point, there is data to be popped (because Thread 2 had just completed a push), and so it returns. Read more here.

The only thing you should be worrying about is when you call wait_and_pop on an empty queue and then never push any more data onto it. This code does not have a mechanism for timing out a wait and returning an error (or throwing an exception).

paddy
  • 60,864
  • 6
  • 61
  • 103
  • I think data_cond.wait() is not a atomic function.It looks like while(flag!=true) condition.wait().if thread1 had check flag,but didn't condition.wait().then thread2 begin,finished with a notify.thread1 begin again ,fall into wait.What would happen in this situation. – JoeyMiu Aug 01 '13 at 03:25
  • Note that the wait predicate uses `get_tail()` which acquires a lock on `tail_mutex`. There is no race condition, because the predicate will block while a push operation is in progress. Locks are smart enough that if one is blocked by another, its thread will be woken up when the lock is released. – paddy Aug 01 '13 at 03:51
  • But if queue is empty, thread1 had finish head.get()!=get_tail() was going to wait on condition variable,then thread2 began push and finished before thread1 excute wait. Then thread1 had wait forever though the queue is not empty – JoeyMiu Aug 01 '13 at 04:07
  • Ah, now I understand what you're asking. If that happens, then I expect the wait should return immediately because the `data_cond` object is in the signalled state. However, I don't know this for sure. I suggest you set up a simple test scenario where one thread notifies the `condition_variable` before the other thread waits on it. – paddy Aug 01 '13 at 04:47
  • @joeymiu I've done a bit of digging around, and found [this old article about boost condition variables](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2406.html) as well as [this StackOverflow question](http://stackoverflow.com/a/15907440/1553090). From what I gather, the `condition_variable` is guaranteed to be atomic between the predicate test and the unlock-and-wait phase, so that it's not possible to miss a notification. And yes, the notification only wakes up threads that are waiting, not threads that haven't yet waited. In summary, you don't need to worry about a data race. – paddy Aug 01 '13 at 05:15
  • thanks a lot paddy,but I didn't see any evidence that prove it's atomic.It only says it's atomic between release lock and sleep . – JoeyMiu Aug 01 '13 at 06:33
  • @paddy Read the comments on the [accepted answer to the question you linked](http://stackoverflow.com/a/15907440/923854), particularly [this one by the answerer](http://stackoverflow.com/questions/15887306/do-i-need-to-synchronize-stdcondition-variable-condition-variable-anynotify/15907440#comment23045214_15907440) that states: "[Notifications can be lost] only if your program is broken. If the predicate is false when you check it, how can it become true before unlocking the mutex?". – Casey Aug 01 '13 at 08:22