First point: they're all kernel objects so all of them involve a switch from user mode to kernel mode. That imposes enough overhead by itself that you're unlikely to notice any real difference between them in terms of speed or anything like that. Therefore, which one is preferable will depend a great deal upon how you're structuring the data in the shared memory region, and how you use it.
Let's start with what would probably be the simplest case: that the shared memory region forms the bottleneck. All the time that the consumer isn't reading, the producer will be writing and vice versa. At least initially, this seems like a case were we can use a single mutex. The producer waits on the mutex, writes data, releases the mutex. The consumer waits on the mutex, reads data, releases the mutex. This continues until everything is done.
Unfortunately, while this protects against the producer and consumer using the shared region at the same time, it does not ensure proper operation. For example: the producer writes a buffer full of information, then releases the mutex. Then it waits on the mutex again, so when the reader is done it can write more data -- but at that point, there's no guarantee that the consumer will be the next one to get the mutex. The producer might get it back immediately, and write more data over what it just produced, so the consumer will never see the previous data.
One way to prevent that would be to use a couple of events: one from the producer to the consumer to say that there's data waiting to be read, and the other from the consumer to the producer to say all the data in the buffer has been read. In this case, the producer waits on its event, which the consumer will only set when it's done reading data. The producer then writes some data, and signals the consumer's event to say some data is ready. The consumer reads the data, and then signals event to the producer so the cycle can continue.
As long as you only have a single producer and single consumer and treat the entire as a single "chunk" of data that's controlled together, that's adequate. That, however, can lead to a problem. Let's consider, for example, a web server front-end as the producer and back-end as the consumer (and some separate mechanism for passing results back to the web server). If the buffer is small enough to only hold one request, the producer may have to buffer up several incoming requests as the consumer is processing one. Each time the consumer is ready to process a request, the producer has to stop what it's doing, copy a request to the buffer, and let the consumer know it can proceed.
The basic point of separate processes, however, is to let each proceed on its own schedule as much as possible. To allow that, we might make room in our shared buffer for a number of requests. At any given time, some number of those slots will full (or, looking at it from the other direction, some number will be free). For this case, we just about need a counted semaphore to track those slots. The producer can write something any time at least one slot is free. The consumer can read something anytime at least one slot is filled.
Bottom line: the choice isn't about speed. It's about how your use/structure the data and the processes' access to it. Assuming it's really as simple as you describe, the pair of events is probably the simplest mechanism that will work.