How do monitors enforce mutual exclusion?

Question

So I read that monitors enforce mutual exclusion, mostly by using 2 procedures wait() and notify()/signal() and I also understand problems solved using the monitor.

What I want to know is how does a monitor enforce mutual exclusion? What happens inside wait() and notify()?

Why can only 1 thread invoke a monitor procedure? How is this constraint enforced?

(I am sorry if I am not very clear)

Are you looking for a high level view (i.e. a monitor is a mutex and a condition variable strapped together), a mid level view (here's why we use condition variables rather than waiting on mutexes), or a low level view (here is how a CPU uses the bus lock to do multithreading)? — Cort Ammon, Jul 08 '14 at 17:43
Low level! OR mid level will do too. I just want to know how wait() and notify() methods are defined — user2684198, Jul 08 '14 at 18:45

score 2 · Answer 1 · answered Jul 08 '14 at 19:59

A monitor is a combination of two threading primitives: the mutex and the condition variable.

A mutex is an object that can be "locked" by only one thread at a time. If a second thread attempts to "lock" the mutex, the second thread is forced to wait until the first thread "unlocks" the mutex.

There are many fast implementations of mutexes which use atomic operations to lock and unlock without kernel help, but all of them eventually fall back on asking the OS to lock the mutex for them. The OS will try to lock the mutex for you, but if it is already locked, the OS will simply suspend the thread and mark that it should be reawakened when that particular mutex is unlocked

A condition variable is a particularly interesting beast which solves a few particular problems with mutexes. In particular, waiting on a mutex is not interruptable. Once you start, you are comited to waiting for the mutex to come available. You can't interrupt a mutex wait with, say, an exception. Mutexes simply gave up that capability in return for their raw speed (they are MUCH faster than other threading primitives).

Also, mutexes are not always guaranteed to be fair, and some algorithms really have trouble with unfair wait times. It is much easier to make a condition variable be fair, because it doesn't have to have the same runtime requirements.

Of course, trying to solve these issues while playing nicely with mutexes is tricky. Condition variables have the following functions

wait(Mutex m) - atomically unlocks 'm' and begins waiting on the mutex. Upon waking, it will re-acquire 'm'
notify() - wake up one thread which is waiting on the condition variable
broadcast() - wake up every thread which is waiting on the condition variable.

Wait is the oddest of functions. Why it has to be defined in that way is well beyond the scope of the question.

With these primitives, one can pseudocode the monitor

class Monitor
{
    private Mutex mutex;
    private ConditionVariable cond;
    public Monitor()
    {
        mutex = new Mutex();
        cond = new ConditionVariable();
    }

    public void enter()
    {
        mutex.lock();
    }


    public void exit()
    {
        mutex.unlock();
    }

    public void wait()
    {
        // mutex should be locked already
        cond.wait(mutex); // wait unlocks the mutex while waiting, relocks after waiting
    }

    public void notify()
    {
        cond.notify();
    }

    public void broadcast()
    {
        cond.broadcast();
    }
};

As to the particular functions you were interested in:

wait() is called when a thread holds the lock. At the OS level, it adds the current thread to a queue of threads to wake up on the condition variable, releases the lock, and tells the scheduler to put the thread to sleep (all three happen "atomically," meaning the OS won't stop half way and switch to another thread). When it is woken up (with a notify or broadcast), it will reacquire the lock before returning to your code.
notify() looks at the first thread in the queue, removes it from the queue, and tells the OS scheduler to begin scheduling that thread again.
mutex locking is usually done using a special operation called "compare/exchange" (or one of the variants on it). Compare exchange does the following algorithm on an "atomic value":
- The user passes in a "compare" value and an "exchange value."
- The CPU compares the "compare" value against the actual value of the atomic value. If they are the same, it sets the atomic value to the "exchange value". If they are different, it does nothing
- Returns the old value (hence why it's called an exchange)
- All of these operations are done atomically, so that no two threads can collide. At the CPU level, this is usually done by forcing the values to go all the way to the main memory banks, skipping caches, and holding the "bus lock" line high during the entire process so that no other processor can interrupt.

Those are all the pieces, in a firehose fashion. If you want to know more, I highly recommend wikipediaing any of the unfamiliar nouns. The threading pages are actually quite good there. (Mutex, Condition Variable, Sleeping Barber, and Compare Exchange are all good searches)

score 0 · Answer 2 · answered Jul 08 '14 at 17:42

Assume a monitor to be a lock and your house as a resource. Now, Whenever any thread wants to enter your house, it has to first acquire the lock to your house. As long as that thread holds the lock, no other thread can enter the house (access that resource).

wait() --> The thread which is currently holding the lock, releases the lock on that object/ resource and waits for other threads to notify() it so that it can continue its execution.

notify() --> The current thread notifies that it is releasing the lock on that resource. All threads which are waiting for this lock, get notified. The OS is smart enough to pick one of these threads, provide the lock to it, and execute it..

How do monitors enforce mutual exclusion?

2 Answers2