3

When writing lock-free code using the Compare-and-Swap (CAS) technique there is a problem called the ABA problem:

http://en.wikipedia.org/wiki/ABA_problem

whereby comparing just on the value "A" is problematic because a write could still have occurred between the two observations. I read on and found this solution:

http://en.wikipedia.org/wiki/LL/SC

In computer science, load-link and store-conditional (LL/SC) are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory location, while a subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. Together, this implements a lock-free atomic read-modify-write operation.

How would a typical C++ lock-free CAS technique be modified to use the above solution? Would somebody be able to show a small example?

I don't mind whether its C++11/x86, x86-64 Linux-specific (preferably no Win32 answers).

Flow
  • 23,572
  • 15
  • 99
  • 156
user997112
  • 29,025
  • 43
  • 182
  • 361

1 Answers1

4

LL/SC are instructions implemented by some architectures (e.g. SPARC) to form the foundation of higher level atomic operations. In x86 you have the LOCK prefix instead to accomplish a similar goal.

To avoid the ABA problem on x86 with LOCK you have to provide your own protection against intervening stores. One way to do this is to store a generation number (just an increasing integer) adjacent to the memory in question. Each updater does an atomic compare/exchange wide enough to encompass both the data and the serial number. The update only succeeds if it finds the right data and the right number. At the same time, it updates the number so that other threads see the change.

You'll note that x86 has always (?) offered a CMPXCHG instruction that is twice as wide as a machine word (see CMPXCHG8B and later CMPXCGH16B) which can be used for this purpose.

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
  • Are you saying that CMPXCHG implements CAS2/Double CAS? – user997112 May 30 '14 at 14:34
  • `CMPXCGH8B` (on a 32-bit system) or `CMPXCHG16B` (on a 64-bit system) are more like "double *width* CAS". A general CAS2 would work on non-contiguous memory. – Ben Jackson May 30 '14 at 16:25
  • Assuming you are using CAS on a pointer, remember, on windows, 64 bit, there are 23 unused bits in an address that is 8 byte aligned. I just use a union to ecapsulate a 64 bit number, and a struct with my ptr/aba members. In linux, unless you wanted to address the full 64 bit range, you could use the same trick. My experience is the 128 bit CAS are slower than the 64 bit ones, so I always try to cram into 64 bits. – johnnycrash Jun 16 '14 at 18:03
  • If you have the ABA problem with pointers, then you also have a lifetime management issue. At least I don't know of any way of having the ABA problem with pointers, without at some point holding a dangling pointer. Herb Sutter has a talk on this, and the solution he proposes is implemented in MSVC for C++20, hopefully it will be implemented everywhere some day: https://youtu.be/CmxkPChOcvw – Balthazar Jul 18 '21 at 12:40