16

In the Migrating Away from Threads section of Apple's Concurrency Programming Guide, there is
Changing Producer-Consumer Implementations, which claims that the typical multistep pthread mutex + condition variable implementation can be simplified using GCD.

With dispatch queues, you can simplify the producer and consumer implementations into a single call:

dispatch_async(queue, ^{
  // Process a work item.
});

When your producer has work to be done, all it has to do is add that work to a queue and let the queue process the item.

The Producer-Consumer problem is also known as the Bounded-Buffer problem, yet the above makes no mention of a buffer, its bound or the consumer, let alone blocking the producer & consumer in order to avoid over/under runs.

How can this be a valid solution?

Rhythmic Fistman
  • 34,352
  • 5
  • 87
  • 159
  • But what would be better in this case from just a performance hit perspective, having 2 threads running the producer and consumer simultaneously, started during system init or Going the GCD route which shall dispatch an async thread when producer has more data to add (since dispatching a thread also takes its own sweet time). – ichthyocentaurs Jul 28 '16 at 07:01
  • The queue is the buffer. – James Bush Apr 30 '17 at 04:48

1 Answers1

24

In the solution decribed at that Apple document:

  1. There is no buffer because no buffer is needed;
  2. System load is the bound;
  3. Consumers are tasks.

Say you have multiple producers and consumers, producers place data in a shared buffer and consumers read data from that shared buffer. A semaphore or monitor is used to synchronise access to the shared buffer, and the buffer size is fixed so as to limit the amount of data that are being produced according to the rate they’re being consumed, hence throttling the producer.

Under Grand Central Dispatch, consumers are tasks dispatched to a queue. Since tasks are Objective-C blocks, a producer doesn’t need a buffer to tell a consumer about the data it should process: Objective-C blocks automatically capture objects they reference.

For example:

// Producer implementation
while (…) {
    id dataProducedByTheProducer;

    // Produce data and place it in dataProducedByTheProducer
    dataProducedByTheProducer = …;

    // Dispatch a new consumer task
    dispatch_async(queue, ^{
        // This task, which is an Objective-C block, is a consumer.
        //
        // Do something with dataProducedByTheProducer, which is
        // the data that would otherwise be placed in the shared
        // buffer of a traditional, semaphore-based producer-consumer
        // implementation.
        //
        // Note that an Objective-C block automatically keeps a
        // strong reference to any Objective-C object referenced
        // inside of it, and the block releases said object when
        // the block itself is released.

        NSString *s = [dataProducedByTheProducer …];
    });
}

The producer may place as many consumer tasks as data it can produce. However, this doesn’t mean that GCD will fire the consumer tasks at the same rate. GCD uses operating system information to control the amount of tasks that are executed according to the current system load. The producer itself isn’t throttled, and in most cases it doesn’t have to be because of GCD’s intrinsic load balancing.

If there’s actual need to throttle the producer, one solution is to have a master that would dispatch n producer tasks and have each consumer notify the master (via a task that’s dispatched after the consumer has finished its job) that it has ended, in which case the master would dispatch another producer task. Alternatively, the consumer itself could dispatch a producer task upon completion.

Specifically answering the items you’ve addressed:

The Producer-Consumer problem is also known as the Bounded-Buffer problem, yet the above makes no mention of a buffer

A shared buffer isn’t needed because consumers are Objective-C blocks, which automatically capture data that they reference.

its bound

GCD bounds the number of dispatched tasks according to the current system load.

or the consumer

Consumers are the tasks dispatched to GCD queues.

let alone blocking the producer & consumer in order to avoid over/under runs

There’s no need for blocking since there’s no shared buffer. As each consumer is an Objective-C block capturing the produced data via the Objective-C block context capturing mechanism, there’s a one-to-one relation between consumer and data.

  • 1
    Wow, this is great. I've got to think about this. Thanks! – Rhythmic Fistman Oct 31 '11 at 08:53
  • 2
    Ok, I get it now. It's a mental change of gears that simplifies things. I'm a little apprehensive about the opaque queue, although I see that I can do almost everything I could before, except maybe take down unfinished consumer tasks. I'd use this in new code, but for the problem I had in mind I can't as my consumer is scheduled on a real thread. Thanks for the brilliant answer, too bad you're not writing the doco. – Rhythmic Fistman Nov 03 '11 at 10:47
  • What I don't like about this solution is that the producer decides how the consumer consumes the data (it has to specify the block). Whereas I might want the consumer to be able to consume data from various places in the code depending on what step of processing it is in. – user102008 Dec 26 '12 at 01:01
  • @user102008 You can include references to (or otherwise make accessible) those other data within the block, or have different kinds of blocks (some kind of block factory, perhaps?). – Richard Jul 23 '13 at 14:12
  • @user102008, the block could simply invoke `[SomeConsumerClass consumeData:dataProducedByTheProducer]` or something like that. That abstracts away the details of what the consumer does with the data. – Ken Thomases Dec 23 '13 at 03:55
  • 1
    Not needing to specify _what_ is produced and consumed is interesting, but letting the bound be system load is an anti-pattern. Apple says you can simplify your producer-consumer code using GCD, but the cost for this "simplification" is throwing away your well defined limits (the buffer size). No wonder the code is shorter. Their advice in this case is overly simplified. But can the example be salvaged? Can a limit be added? – Rhythmic Fistman Jan 05 '17 at 01:16