0

I know about memory coherence protocols for multi-core architectures. MSI for example allows at most one core to hold a cache line in M state with both read and write access enabled. S state allows multiple sharers of the same line to only read the data. I state allows no access to the currently acquired cache line. MESI extends that by adding an E state which allows only one sharer to read, allowing an easier transition to M state if there are no other sharers.

from what I wrote above, I understand that when we write this line of code as part of multi-threaded (pthreads) program:

// temp_sum is a thread local variable
// sum is a global shared variable
sum = sum + temp_sum; 

It should allow one thread to access sum in M state invalidating all other sharers, then when another thread reaches the same line it will request M invalidating again the current sharers and so on. But in fact this doesn't happen unless I add a mutex:

pthread_mutex_lock(&locksum);
// temp_sum is a thread local variable
// sum is a global shared variable
sum = sum + temp_sum; 
pthread_mutex_unlock(&locksum);

This is the only way to have this work correctly. Now why do we have to supply these mutexes? why isn't this handled by memory coherence directly? why do we need mutexes or atomic instructions?

mewais
  • 1,265
  • 3
  • 25
  • 42
  • I have answered a [similar question before](http://stackoverflow.com/questions/34928199/potential-for-race-conditions-in-reader-writer-pseudocode/34932067#34932067). The problem is not in the individual memory read/write, but in the full modification of the value: load, modify, store. – e0k Feb 20 '16 at 05:04
  • 1
    Cache coherency doesn't include registers. – EOF Feb 20 '16 at 15:56

1 Answers1

1

Your line of code sum = sum + temp_sum; although it may seem trivially simple in C, it is not an atomic operation. It loads the value of sum from memory into a register, performs arithmetic on it (adding the value of temp_sum), then writes the result back to memory (wherever sum is stored).

Even though only one thread can read or write sum from memory at a time, there is still an opportunity for a synchronization problem. A second thread could modify sum in memory while the first is manipulating the value in a register. Then the first thread will write what it thinks is the updated value (the result of arithmetic) back to memory, overwriting whatever the second put there. It is this transitional location in a register that introduces the issue. There is more to the notion of "the value of a variable" than whatever currently resides in memory.

For example, suppose sum is initially 4. Two threads want to add 1 to it. The first thread loads the 4 from memory into a register, and adds 1 to make 5. But before this first thread can store the result back to memory, a second thread loads the 4, adds 1, and writes a 5 back to memory. The first thread then continues and stores its result (5) back to the same memory location. Both threads are convinced that they have done their duty and correctly updated the sum. The problem is that sum is 5 and not 6 as it should be.

The mutex ensures that only one thread will load, modify, and store sum at a time. Any second thread will have to wait (be blocked) until the first has finished.

e0k
  • 6,961
  • 2
  • 23
  • 30