3

When I use the x86_64 CAS-instruction LOCK CMPXCHG, i.e. while atomic (reads value, compares and writes the result back), at this time what is locked:

  1. only one cache line in L3-cache? (at this time no one core can't to read/write from/to this cache-line in L3)
  2. or the L3-cache entirely? (at this time no one core can't to read/write from/to L3-cache)

Is this true, that x86_64 Intel CPU uses?

  • 1-st aproach for Exclusive state of cache-line (MOESI/MESIF)
  • 2-nd aproach for any state except Exclusive
Alex
  • 12,578
  • 15
  • 99
  • 195

1 Answers1

7

Neither is accurate. The second is similar to what actually happens on a bus-lock, which in modern x86 CPUs is a (hopefully) rare and pathological case when a regular lock can't work. It used to be common on the old 486 / early Pentiums, but on the newer products the common case is much simpler - you lock the line in the cache, but since you want to do the read-modify-write as fast as possible - there's also no sense in doing this in the L3. Instead, you'll choose the closest cache to the operating core - probably the L1 or some equivalent internal structure.

You can guarantee that the atomic RMW is done safely in the cache even with a simple MESI - you first get ownership of the line (like any normal write would need to), then you can do the atomic flow when you know for sure that no other core has this line. The only problem is that snoops may in theory come in the middle, so the solution is usually to simply block snoops for this line until the RMW is done. However, there's no problem with allowing any other activity during that period (such as other requests coming out of the same core, or snoops coming in. The only other limitation is regarding memory ordering, but that's usually handled in the memory unit (where there's still a notion of order) and not at the cache.

See also the manual section in this answer - x86 LOCK question on multi-core CPUs

Community
  • 1
  • 1
Leeor
  • 19,260
  • 5
  • 56
  • 87
  • Thank you! I.e. now in most cases prefix `LOCK` is ignored in the sense in which it was originally created - lock bus (early FSB, and now QPI). And in most cases instead of lock bus, CPU-Core only add a specific flag to the cache-line which disable snoop-answers? "Ignored LOCK" - I mean that *"if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus."* http://stackoverflow.com/a/3339380/1558037 – Alex Jan 26 '15 at 18:33
  • 1
    Exactly. I couldn't find this section on a newer manual (they may have changed the wording), but this assumption should remain. – Leeor Jan 26 '15 at 20:52