Safe removal of a node in a concurrent linked list

Question

I'm currently reading the book APUE. When I read the chapter about pthread reader/writer-lock, I have a question about its implementation of concurrent queue using reader/writer-lock.

struct queue {
    struct job *q_head;
    struct job *q_tail;
    pthread_rwlock_t q_lock;
};

/*
* Remove the given job from a queue.
*/
void
job_remove(struct queue *qp, struct job *jp)
{
    pthread_rwlock_wrlock(&qp->q_lock);
    if (jp == qp->q_head) {
        qp->q_head = jp->j_next;
        if (qp->q_tail == jp)
            qp->q_tail = NULL;
        else
            jp->j_next->j_prev = jp->j_prev;
    } else if (jp == qp->q_tail) {
        qp->q_tail = jp->j_prev;
        jp->j_prev->j_next = jp->j_next;
    } else {
        jp->j_prev->j_next = jp->j_next;
        jp->j_next->j_prev = jp->j_prev;
    }
    pthread_rwlock_unlock(&qp->q_lock);
}

My question is that how this implementation can ensure that a struct job is removed from the linked list only once. From my understanding, two threads can be scheduled so that they are just before the line pthread_rwlock_wrlock. Then the struct job *jp might be freed twice. If struct job * is a dynamically allocated data structure, this might lead to a double-free bug. Any advice?

What do you mean by "two threads can stop before the line pthread_rwlock_wrlock"? Why would the threads stop? — 9769953, Aug 09 '18 at 09:35
I mean these threads can be scheduled so that two threads are both running to that step. — yeshengm, Aug 09 '18 at 10:56
You would have to look inside the source for `pthread_rwlock_wrlock`, but my guess is that there is a single if statement that checks whether a lock is set or available. That should amount to a single machine instruction, meaning that's the point where one thread gets the lock, and the other thread will have to wait until the lock is free. — 9769953, Aug 09 '18 at 11:28
The posted code likely fails to compile. It's missing a closing brace right before `else if (jp == qp->q_tail)`. — Andrew Henle, Aug 09 '18 at 22:06

score 0 · Accepted Answer · answered Aug 09 '18 at 22:16

Your code has a race condition in it. Other changes can potentially happen to the queue between two threads obtaining a write lock on the queue and then trying to remove the same node. Other nodes can be removed, other nodes can be added. So if thread A removes a node, changes happen, and then thread B tries to remove the same node again, your queue can be corrupted.

The code needs information to let it know that the node has already been removed.

See the added comments, and the code to fix the race condition:

struct queue {
    struct job *q_head;
    struct job *q_tail;
    pthread_rwlock_t q_lock;
};

/*
* Remove the given job from a queue.
*/
void
job_remove(struct queue *qp, struct job *jp)
{
    // we assume here that jp is actually in the queue
    pthread_rwlock_wrlock(&qp->q_lock);

    // at this point, jp may no longer be in the queue,
    // and in fact, the queue may be completely different.
    // thus, any modification to the queue based on the
    // assumption that jp is still part of the queue
    // can lead to corruption of the queue

    // so check if jp has been removed - we'll later set
    // both j_next and j_prev to NULL after jp is
    // removed from the queue - and if they're both
    // NULL here that means another thread already
    // removed jp from the queue
    if ( !(jp->j_next) && !(jp->j_prev) ) {
        // empty statement - jp is already removed
        ;
    }
    else if (jp == qp->q_head) {
        qp->q_head = jp->j_next;
        if (qp->q_tail == jp)
            qp->q_tail = NULL;
        else
            jp->j_next->j_prev = jp->j_prev;
    } // and this brace was missing in your posted code...
    else if (jp == qp->q_tail) {
        qp->q_tail = jp->j_prev;
        jp->j_prev->j_next = jp->j_next;
    } else {
        jp->j_prev->j_next = jp->j_next;
        jp->j_next->j_prev = jp->j_prev;
    }

    // make sure data in jp no longer refers to
    // the queue - this will also tell any other
    // thread that accesses jp that it's already
    // been removed
    jp->j_next = NULL;
    jp->j_prev = NULL;

    pthread_rwlock_unlock(&qp->q_lock);
}

You also need to check how jp gets free()d or deleted. Multiple threads can't be allowed to do that.

So the hitch case you are trying to fix here is if there are two threads accessing that part of the code and another one is in the background constantly calling delete? — Alexandre Mercier Aubin, Aug 09 '18 at 22:44
That is to say some kind of C++'s shared_ptr or garbage collector is required to safely remove a dynamically allocated node? — yeshengm, Aug 10 '18 at 01:48
@AlexandreMercierAubin I'm not solving any particular case. I'm removing a race condition that allows a node to be removed more than one, potentially corrupting a queue that's in an unknown state. — Andrew Henle, Aug 10 '18 at 09:32
@manifold *That is to say some kind of C++'s shared_ptr or garbage collector is required to safely remove a dynamically allocated node?* No. While such constructs make multithreaded coding easier, they're not necessary. — Andrew Henle, Aug 10 '18 at 09:35
The other functions related to this struct are locked with the same variable and lock process. There shouldn't be a race condition unless the user willingly tries to mess with the struct by not using the functions that comes with it. — Alexandre Mercier Aubin, Aug 10 '18 at 11:33
@AlexandreMercierAubin *There shouldn't be a race condition unless the user willingly tries to mess with the struct by not using the functions that comes with it* Not correct. Thread A and B both try to delete node X. Node X->j_prev is node Y. Thread A runs, deletes node X from the list. **Then thread C runs and deletes node Y from the list.** When thread B then deletes node X **again**. the list is corrupted. Race condition. Full stop. — Andrew Henle, Aug 10 '18 at 11:39
Oh yeah, I see, the corruption would come from the fact that the second removal of X would reinstate the link from another node Z to Y. This would still run and be navigable until the queue_destroy is called to free dangling nodes. After that point, boom! :D — Alexandre Mercier Aubin, Aug 10 '18 at 12:13

Safe removal of a node in a concurrent linked list

1 Answers1