3

I'm studying multithreading and trying to understand the concept of semaphores and mutual exclusion. Most of the examples I find online use some sort of library (e.g. pthread) to implement the semaphore or mutex, but I'm more interested in the implementation of a simple semaphore that establishes a critical section -- no more than one thread accessing a particular region of memory.

For this task, I believe I would need a mutex (a.k.a. a binary semaphore if I understand the terminology correctly). I can see how the semaphore would prevent a race condition by "locking" the section of code to a single thread, but what prevents a race condition from occurring at the semaphore itself?

I imagine a binary semaphore to hold an int value to keep track of the lock:

Semaphore
---------
int lock = 1;

unsigned P(void){
    if(lock > 0){
        lock--;
        return 0; /* success */
    }
    return 1; /* fail */
}

void V(void){
    lock++;
}

Suppose two threads call the P function at the same time, they both reach the if(lock > 0) check at the same time and evaluate the condition as true -- this creates a race condition where both threads are granted access to the same region of memory at the same time.

So what prevents this race condition from occurring in real world implementations of semaphores?

Vilhelm Gray
  • 11,516
  • 10
  • 61
  • 114

3 Answers3

4

Locking and relasing semaphores and/or mutexes happen as atomic operations, this means the CPU cannot be withdrawn from the current process. This ensures, that as soon as a mutex-lock is started (it consists of either a single or a few CPU-instruction (microcode)), the process keeps the CPU until the locking/releasing is done.
There are also different ways to implement threading, which can either be a direct support by CPU (kernel-space) or through a library (such as pthreads) in user-space.


From OSDev.org

An atomic operation is an operation that will always be executed without any other process being able to read or change state that is read or changed during the operation. It is effectively executed as a single step, and is an important quality in a number of algorithms that deal with multiple indepent processes, both in synchronization and algorithms that update shared data without requiring synchronization.


Here is a nice article on atomicity, too (although in Delphi).

bash.d
  • 13,029
  • 3
  • 29
  • 42
  • So once a thread calls the `P` function, it executes the instructions as "atomic" operations which guarantee that the CPU does not switch to another process while in the middle of the call. Does this also prevent another CPU from calling the `P` function at the same time and clobbering the memory? – Vilhelm Gray Apr 23 '13 at 13:24
  • First is true, this is guaranteed. Secondly, all processes are scheduled by the scheduler and due to virtual memory there is no other CPU that can corrupt that memory, at least not under normal conditions. – bash.d Apr 23 '13 at 13:26
  • @bash.d Almost all threading implementations allow multiple CPUs access the same virtual address space at the same time. Nobody bothers with guaranteeing atomicity through the scheduler, that would be way too slow for most practical applications. – Art Apr 23 '13 at 13:30
  • The same virtual address-space does not refer to the actual address in memory, so this doesn't matter in this case. Apart from that, who does guarantee? – bash.d Apr 23 '13 at 13:34
  • If the operating system doesn't guarantee atomicity through the scheduler, is the it motherboard that prevents a second CPU from corrupting the memory? – Vilhelm Gray Apr 23 '13 at 13:36
  • No, the motherboard is just the hardware... It is the software that matters. – bash.d Apr 23 '13 at 13:36
  • @VilhelmGray The short answer is: yes, the motherboard helps in preventing races here. In general there's a complex protocol for how CPUs are allowed to talk to memory. Each cache line of memory can only be written to by one CPU at a time and there are ways the CPUs signal each other to ensure that writes are properly seen by other CPUs and other devices in the machine. – Art Apr 23 '13 at 13:40
  • These are micro-operations you are talking about, which is a running program and not the pure hardware. – bash.d Apr 23 '13 at 13:42
  • @bash.d In the article you posted, it describes how `if L = 1 then N:= N + K;` in **thread 2** can grab `K` equal to `0` -- but how is this possible if **thread 1** is the only thread setting the memory and it's executing atomically? I thought reordering cannot occur within a thread executing atomically -- and the only thread able to change the values is executing atomically -- so wouldn't `K` have to be equal to `5` if `L` equals `1`? – Vilhelm Gray Apr 23 '13 at 13:55
  • 1
    No, the **locking** operations occur atomically. The entire locked are will not be executed atomically. And remember, threads share a common address-space and are all bound to a process. – bash.d Apr 23 '13 at 14:01
  • Yes, this might happen. You cannot predict any sequence, though – bash.d Apr 23 '13 at 14:05
  • So my final question would be: how do I get a series of operations to execute atomically in C? – Vilhelm Gray Apr 23 '13 at 14:05
  • Well, `atomically` is not possible in this sense, but if you want to execute them only from a single thread in a multithreaded process, use a `mutex` or a `semaphore`. This is what they exist for. See [Wikipedia](http://en.wikipedia.org/wiki/Mutex) for a little information. – bash.d Apr 23 '13 at 14:06
  • Wait, but then we're back to the original problem: that my naively implemented binary semaphore function can be call by two threads at the same time, unless the instructions within the call can execute atomically -- the questions is how do I guarantee my implementation of `P` is executed atomically as I expect well-known implementations such as `pthread` to guarantee; does this just boil down to processor-specific instructions rather than C? – Vilhelm Gray Apr 23 '13 at 14:18
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/28750/discussion-between-vilhelm-gray-and-bash-d) – Vilhelm Gray Apr 23 '13 at 14:20
  • No, not really... In your code you are having a potential race-condition, because your locking-mechanism is not atomic. Thread A locks the sequence you want to access... If Thread B now goes to the `lock`-statement, it will block, because it cannot enter the locked statement. When Thread A releases the lock, Thread B might be woken up and claim the lock for itself. – bash.d Apr 23 '13 at 14:22
3

The most common (although definitely not the only) way to implement most locking primitives are compare-and-set instructions. An normal move instruction would just set the value of a memory location to whatever value you ask it to while a compare-and-set instruction does "atomically set this memory location to value X only if the value of the memory location is Y, then set some flag if the operation succeeded or not". The keyword "atomic" is that the CPU can in hardware make sure that nothing else can interfere with that operation.

Using a compare-and-swap instruction your example P could be implemented as:

int oldlock;
retry:
oldlock = lock;
if (oldlock > 0) {
    if (compare_and_swap(&lock, oldlock, oldlock - 1))
        goto retry;
    return 0;
}
return 1;

Of course reality is much more complex than that, but compare-and-set is easy to understand and explain and has the nice property that it can implement (almost?) all other locking primitives.

Here's a wikipedia article.

Art
  • 19,807
  • 1
  • 34
  • 60
2

The difference between a semaphore (or a mutex) and a "normal" variable isn't that big. Those libraries which offer you this functionality just make sure that the semaphore is only accessed through atomic operations. There are multiple ways to achieve that:

  • Special assembly instructions which guarantee atomic access, e.g.: TSL or XCHG.

  • Turning off the scheduler's interrupts before the variable gets accessed and afterwards turning them back on again. So the scheduler can't remove your process from the CPU. But you have to be aware that this only works on single CPU systems.

  • Using language specific features like Java's synchronise keyword.


An example of how to use the TSL instruction:

enter_region:        ; A "jump to" tag; function entry point.

  tsl reg, flag      ; Test and Set Lock; flag is the
                     ; shared variable; it is copied
                     ; into the register reg and flag
                     ; then atomically set to 1.

  cmp reg, #0        ; Was flag zero on entry_region?

  jnz enter_region   ; Jump to enter_region if
                     ; reg is non-zero; i.e.,
                     ; flag was non-zero on entry.

  ret                ; Exit; i.e., flag was zero on
                     ; entry. If we get here, tsl
                     ; will have set it non-zero; thus,
                     ; we have claimed the resource
                     ; associated with flag.

leave_region:
  move flag, #0      ; store 0 in flag
  ret                ; return to caller

By the way, as you already and correctly pointed out, a mutex is just a special kind of a semaphore, only allowing FALSE (represented by 0 in C) and TRUE (represented by 1 or any other value != 0) for it's internal int variable. Thus making it a so called binary semaphore.

winklerrr
  • 13,026
  • 8
  • 71
  • 88