Performance cost of MESI protocol?

Question

The MESI (Modified, Exclusive, Shared, Invalid) protocol is used for CPU caches to communicate and ensure they are all using the latest value for a cache line. When one CPU modifies a cache line value, all other CPUs subscribed to this cache line are kept alerted on the change to the cache line.

However, in all the literature I have read regarding MESI I haven't seen if there is any performance cost whilst the protocol communicates? Would this cost just be part of the x86 LOCK prefix cost? I am fairly certain MESI can be used even when the x86 LOCK prefix is not?

NB Intel actually uses the MESIF protocol- where F is an additional "Forwarding" state.

The assumption that a cache coherency protocol is only needed when executing an atomic instruction is wrong. And the performance hit depends very much on the situation. But one very well known example of the performance hit of a cache coherency protocol is [false sharing](http://en.m.wikipedia.org/wiki/False_sharing) — Voo, Nov 25 '14 at 11:52
This is too broad, you have many types of caches types (varying on parameters like inclusiveness, WB/WT, private/shared, etc), and many types of MESI flavors, sometimes even multiple types within the same CPU. Which one are you asking about? What alternative do you want to examine as baseline? — Leeor, Nov 25 '14 at 14:22

score 1 · Answer 1 · answered Jan 14 '16 at 20:17

Yes, the MESI(F) protocol is used on all memory operations (ie reads and writes). Imagine, you've written something into the cache (ie 'M' state) and now that line has to be evicted. The protocol says you need to write it back to memory. If the protocol wasn't used, then we'd either need to always write-thru to memory (huge bandwidth cost) or have a inconsistent memory (bad idea).

That is, if there's no sharing, the MESI protocol would still be used and in such case, the lines would be in 'E', 'I', or 'M' states and no 'S' will be used.

A side note: remember that almost all applications on a system typically share some dynamic library code. Where do you think that code will reside and how its access will be managed?

Now, to answer your question on the performance impact. Yes, implementing MESI(f) or any coherence protocol will have an impact on performance, but that impact is actually positive when compared to the case where no coherence protocol exists. In that case, every read/write would need to go to main memory (ie, your application will be 100's of times SLOWER).

So, bottom line: Although, the MESI(f) protocol does have a negative impact on bandwidth, overall it has a positive impact on performance. It actually buys us alot of performance (and power) compared to the case where we don't use a cache coherence protocol (ie no cache)

score 0 · Answer 2 · answered Nov 25 '14 at 11:13

0

MESI protocol works by exchanging messages on the inter-CPU bus (however that bus is implemented). That bus has limited throughput capacity, so that one can saturate it by using atomic instructions.

This is why poorly written applications that needlessly use atomic instructions adversely affect an entire machine.

answered Nov 25 '14 at 11:13

Maxim Egorushkin

131,725
17
180
271

1

You need a cache coherency protocol on x86 also if you don't use atomic instructions.. the reason why atomic instructions are more expensive is because they have to lock the bus. – Voo Nov 25 '14 at 11:36
@Voo I am pretty sure I do not. – Maxim Egorushkin Nov 25 '14 at 11:37
Oh you do. Simple example: If you didn't, false sharing wouldn't be a thing. – Voo Nov 25 '14 at 11:39
2

Of course you do, cache coherency is more extensive than mere atomicity. If one core writes a line and another reads it later on, you can't rely on the modification to get to the memory / shared cache level by itself. – Leeor Nov 25 '14 at 14:20

Performance cost of MESI protocol?

2 Answers2