0

I am developing a message queue between two processes on Windows. I would like to support multiple producers and one consumer. The queue must not be corrupted by the crash of one of the processes, that is, the other processes are not effected by the crash, and when the crashed process is restarted it can continue communication (with the new, updated state).

Assume that the event objects in these snippets are wrappers for named Windows Auto Reset Events and mutex objects are wrappers for named Windows mutex (I used the C++ non-interprocess mutex type as a placeholder).

This is the producer side:

void producer()
{
    for (;;)
    {
        // Multiple producers modify _writeOffset so must be given exclusive access

        unique_lock<mutex> excludeProducers(_producerMutex);

        // A snapshot of the readOffset is sufficient because we use _notFullEvent.

        long readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);

        // while is required because _notFullEvent.Wait might return because it was abandoned

        while (IsFull(readOffset, _writeOffset))
        {
            _notFullEvent.Wait(INFINITE);

            readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);
        }

        // use a mutex to protect the resource from the consumer
        {
            unique_lock<mutex> lockResource(_resourceMutex);
            produce(_writeOffset);
        }

        // update the state

        InterlockedExchange(&_writeOffset, IncrementOffset(_writeOffset));
        _notEmptyEvent.Set();
    }
}

Similarly, this is the consumer side:

void consumer()
{
    for (;;)
    {
        long writeOffset = InterlockedCompareExchange(&_writeOffset, 0, 0);

        while (IsEmpty(_readOffset, writeOffset))
        {
            _notEmptyEvent.Wait(INFINITE);
            writeOffset = InterlockedCompareExchange(&_writeOffset, 0, 0);
        }

        {
            unique_lock<mutex> lockResource(_resourceMutex);
            consume(_readOffset);
        }

        InterlockedExchange(&_readOffset, IncrementOffset(_readOffset));
        _notFullEvent.Set();
    }
}

Are there any race conditions in this implementation? Is it indeed protected against crashes as required?

P.S. The queue meets the requirements if the state of the queue is protected. If the crash occurred within the process(i) or consume(i) the contents of those slots might be corrupted and other means will be used to detect and maybe even correct corruption of those. Those means are out of the scope of this question.

David Sackstein
  • 500
  • 1
  • 5
  • 19
  • 2
    "Those means are out of the scope of this question." Probably not. If a program crashes it is almost always because it has experienced undefined behaviour, and so reasoning about it becomes moot. –  Sep 07 '17 at 21:40
  • This is close to being unanswerable. Look up the Halting Problem. – Kelly S. French Sep 07 '17 at 21:44
  • 1
    I think it does not work even if nothing crashes. `excludeProducers` lock will prevent more than one producer from waiting to be able to write and from writing at all times. Basically this may only work in a single producer scenario. If producer dies right before `_notEmptyEvent.Set();` then consumer may get stuck forever. – user7860670 Sep 07 '17 at 21:44
  • @NeilButterworth Well, I am not saying this should not be dealt with, I am just saying that that problem can be solved and that the solution can be decoupled from the this problem. – David Sackstein Sep 07 '17 at 22:32
  • @VTT I think you are wrong: 1. excludeProducers is released at the end of each iteration, so if two producers run this loop in two different threads, they may interleave. 2. If the producer dies right before _notEmptyEvent.Set then the data that it placed at _writeOffset will be lost and the consumer will continue to wait. But when the producer is restarted, it will write again to the same location, and if it manages to get to _notEmptyEvent.Set it will wake up the consumer. As far as the consumer is concerned, all is well. It just waited a little longer to get an item. – David Sackstein Sep 07 '17 at 22:41
  • why not use *IOCP* for your task ? – RbMm Sep 07 '17 at 22:52
  • @Kelly, the Halting Problem just means that there is no algorithm that can prove whether an *arbitrary* piece of code halts or not. It doesn't mean that you can't prove correctness for a *specific* bit of code. I really don't think it is relevant here. – Harry Johnston Sep 09 '17 at 02:26
  • @RbMm, I didn't think IOCP could be used between different processes? At any rate, if the consumer were to crash, everything in the IOCP queue would be lost, so I don't think that approach would work. – Harry Johnston Sep 09 '17 at 22:34
  • @HarryJohnston - IOCP can be used between different processes if create it with name via `ZwCreateIoCompletion` + `ZwOpenIoCompletion` or use `OBJ_OPENIF` – RbMm Sep 09 '17 at 22:43

2 Answers2

0

There is indeed a race condition in this implementation. Thank you @VTT for pointing it out.

@VTT wrote that if the producer dies right before _notEmptyEvent.Set(); then consumer may get stuck forever.

Well, maybe not forever, because when the producer is resumed it will add an item and wake up the consumer again. But the state has indeed been corrupted. If, for instance this happens QUEUE_SIZE times, the producer will see that the queue is full (IsFull() will return true) and it will wait. This is a deadlock.

I am considering the following solution to this, adding the commented code on the producer side. A similar addition should be made on the consumer side:

void producer()
{
    for (;;)
    {
        // Multiple producers modify _writeOffset so must be given exclusive access

        unique_lock<mutex> excludeProducers(_producerMutex);

        // A snapshot of the readOffset is sufficient because we use _notFullEvent.

        long readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);

        // ====================== Added begin

        if (!IsEmpty(readOffset, _writeOffset))
        {
            _notEmptyEvent.Set();
        }

        // ======================= end Added

        // while is required because _notFullEvent.Wait might return because it was abandoned

        while (IsFull(readOffset, _writeOffset))

This will cause the producer to wake up the consumer whenever it gets the chance to run, if indeed the queue is now not empty. This is looking more like a solution based on condition variables, which would have been my preferred pattern, were it not for the unfortunate fact that on Windows, condition variables are not named and therefore cannot be shared between processes.

If this solution is voted correct, I will edit the original post with the complete code.

David Sackstein
  • 500
  • 1
  • 5
  • 19
  • The most obvious problem is that you are calling produce() while you still own _producerMutex, which defeats the purpose of having multiple producers. Also the way you are using _resourceMutex kind of defeats the point of having a producer/consumer model, if you're only doing one or the other at any given time. – Harry Johnston Sep 09 '17 at 00:28
0

So there are a few problems with the code posted in the question:

  • As already noted, there's a marginal race condition; if the queue were to become full, and all the active producers crashed before setting _notFullEvent, your code would deadlock. Your answer correctly resolves that problem by setting the event at the start of the loop rather than the end.

  • You're over-locking; there's typically little point in having multiple producers if only one of them is going to be producing at a time. This prohibits writing directly into shared memory, you'll need a local cache. (It isn't impossible to have multiple producers writing directly into different slots in the shared memory, but it would make robustness much more difficult to achieve.)

  • Similarly, you typically need to be able to produce and consume simultaneously, and your code doesn't allow this.

Here's how I'd do it, using a single mutex (shared by both consumer and producer threads) and two auto-reset event objects.

void consumer(void)
{
    claim_mutex();
    for (;;)
    {
        if (!IsFull(*read_offset, *write_offset))
        {
            // Queue is not full, make sure at least one producer is awake
            SetEvent(notFullEvent);
        }

        while (IsEmpty(*read_offset, *write_offset))
        {
            // Queue is empty, wait for producer to add a message
            release_mutex();
            WaitForSingleObject(notEmptyEvent, INFINITE);
            claim_mutex();
        }

        release_mutex();
        consume(*read_offset);
        claim_mutex();

        *read_offset = IncrementOffset(*read_offset);
    }
}

void producer(void)
{
    claim_mutex();
    for (;;)
    {
        if (!IsEmpty(*read_offset, *write_offset))
        {
            // Queue is not empty, make sure consumer is awake
            SetEvent(notEmptyEvent);
        }

        if (!IsFull(*read_offset, *write_offset))
        {
            // Queue is not full, make sure at least one other producer is awake
            SetEvent(notFullEvent);
        }

        release_mutex();
        produce_in_local_cache();
        claim_mutex();

        while (IsFull(*read_offset, *write_offset))
        {
            // Queue is full, wait for consumer to remove a message
            release_mutex();
            WaitForSingleObject(notFullEvent, INFINITE);
            claim_mutex();
        }

        copy_from_local_cache_to_shared_memory(*write_offset);
        *write_offset = IncrementOffset(*write_offset);
    }
}
Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
  • I refactored this code a little to what I believe is equivalent yet more modular. There is even the beginnings of a test. The code is at this link. https://drive.google.com/file/d/0B4waDAjVsjeDWTRWMjdOeURDZG8/view?usp=sharing Would you mind reviewing and letting me know if my reorganization is correct? – David Sackstein Oct 25 '17 at 23:30
  • ... it doesn't seem to solve the same problem, in that it will only work within a single process. As for correctness, I'm not familiar enough with the C++ synchronization functions, or the use of condition variables, to be sure. You could perhaps post it on Code Review? – Harry Johnston Oct 27 '17 at 21:19
  • Oh. Maybe I should clarify. Use of the condition variable is only within AutoResetEvent and is not part of the solution On Windows I would use named Event objects. 2. I phrased this code like a single process solution because that is how I tested it but the intent is to use named Windows mutex and shared memory in the interprocess implementation. – David Sackstein Oct 27 '17 at 22:16
  • Oh I am sorry it was unclear I should clarify. 1. The code was written in a single process way because it was easier for me to use for basic correctness tests. But the intent is to use Windows named mutex instead of std::mutex, Windows named events instead of the custom AutoResetEvent and shared memory for the queue itself. 2. Condition variables are only used in the AutoResetEvent which is not part of the solution as explained above. – David Sackstein Oct 27 '17 at 22:22