3

I was reading about MESI protocol and cannot understand why there is data race if we have exclusive access on every write operation which consequently invalidated cache lines in other cores' caches? in this example:

CYCLE # CORE 1                        CORE 2
0   reg = load(&counter);   
1   reg = reg + 1;                reg = load(&counter);
2   store(&counter, reg);         reg = reg + 1;
3                                 store(&counter, reg);

it's said that the overall result is that the variable is only incremented once while both cores try to increment it (result is expected to be two). So the question is if during writing operation both cores request exclusive access to the cache line (and thus other cores "wait" their turn to modify and thus get also exclusive access) why there is data race on that variable?

  • 1
    You are confusing C++ and hardware with MESI. C++ does not have MESI. It says that data races are UB. Hardware may have MESI and data races may have defined behavior. It's still UB in C++, even when compiled for that platform. – nwp Apr 29 '19 at 13:50
  • I will simplify the question: In my understanding considering MESI protocol on hardware level there is no chain of events which led to the variable being incremented by only just 1. Can you tell me how that might happen if increment (write) operation in hardware are conducted during exclusive cache line access? So if one core increments variable it invalidates value for other core's cache and that core will increment newly fetched value (that is 1) and we should get the value 2 in the end – shota silagadze Apr 29 '19 at 13:55
  • I don't know enough about your platform to answer that question, but I can tell that you should remove the C++ tag, because it has nothing to do with C++. You probably have to specify the platform more closely, like x86, before you get an answer that is not "it depends". – nwp Apr 29 '19 at 14:02
  • Data race is a C/C++ memory model concept. There is a **general prohibition in the programmer** and a universal permission for the C/C++ implementation to assume ordinary **objects are not modified under his nose** by other threads (or signal handlers). Races don't produce bad behavior at the CPU level. (There is no such general prohibition in Java, valid programs can have data races, but the behavior of such programs can be affected by legal optimizations.) – curiousguy Apr 30 '19 at 23:04
  • "_there is no chain of events which led to the variable being incremented by only just 1_" So you really have a Q about a possible **race condition**, not "data race". In C/C++, concurrent modifications to atomic objects can have race conditions but don't create a **data** race. Any conflicting modification of anything can create a race condition. You have a race condition when one process renames a file in directory D while another lists D and opens files. You can have a race condition when you do multiple independent SQL queries, if the database is also modified by another client. – curiousguy Apr 30 '19 at 23:09
  • Can you please reword your Q to avoid the confusing "data race" and express it purely in term of **atomicity** of a read-modify-write operation? – curiousguy Apr 30 '19 at 23:11

1 Answers1

6

If I read the matter right, MESI is just a red herring here:

0   reg = load(&counter);   

counter has now been loaded into CPU's register.

1   reg = reg + 1;                reg = load(&counter);

First processor incremented the value, second one loaded the old one.

2   store(&counter, reg);         reg = reg + 1;

First processor stores value, second one increments its outdated one.

3                                 store(&counter, reg);

Second processor stores the result of calculation based on an outdated value.

Should be clear so far. Now how will that change if adding MESI states:

0   reg = load(&counter);   

counter is in CPU 1 cache, marked E.

1   reg = reg + 1;                reg = load(&counter);

counter still resides in CPU 1 cache, but is loaded into CPU 2 cache, too. So both cache lines need to be marked as S

2   store(&counter, reg);         reg = reg + 1;

Now counter gets stored back in cache. CPU 1 cache thus needs to be marked as M, while CPU 2 cache gets invalidated (marked as I).

3                                 store(&counter, reg);

As CPU 2 cache got invalidated, it needs to be updated before the store operation can occur, which, in turn, requires CPU 1 cache to be written back to memory before (of course).

But all that being done now, the value in reg still was calculated based on an outdated value and still overwrites the (now updated) value in cache...

Adding a final detail: After the operation, CPU 2 cache will be marked M and CPU 1 cache I.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Thanks, I got the point, I didn't remember that during invalidation, the variable might have been in register and thus not invalidated in there too. So as I understand MESI isn't about data coherency in program but just between cpu caches and kind of orders things right? – shota silagadze Apr 29 '19 at 15:03
  • 2
    @shotasilagadze: coherent caches don't turn separate read + add + write operations into an atomic RMW increment. But it does mean that two different threads can't read different values for the same variable long-term. Also: [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](//stackoverflow.com/a/50679223) - IRIW reordering where 2 readers disagree on the order of stores from 2 other threads can't normally happen with just MESI, only if some threads can see stores before they become *globally* visible. – Peter Cordes Apr 29 '19 at 19:01
  • 1
    @shotasilagadze: but much more simply, having a store buffer (like the x86 TSO memory model) means a thread can see its own stores before they become globally visible, so https://preshing.com/20120515/memory-reordering-caught-in-the-act – Peter Cordes Apr 29 '19 at 19:02