Determine if concurrency logic can be done using only CAS operations

Question

For high-performance multi-threading system, is there a deterministic way/methodology to determine what concurrency logic can be done using only compare-and-swap a.k.a. atomic operations, what must use locks, semophones and/or barriers?

My systems always involve a lot of concurrency and multi-threading issue. Some are simple as one can work out if a simple lock is needed quickly; but for some complicated problems, or trials to push performance to extreme, I found that I don't have a consistent deterministic methodology to tell if a problem can be resolved using only CAS. As an example:

Typical producer/consumer model. Concurrent queue can resolve the problem using CAS only.
Producer/consumer model with a lot of updates but conflated consumption. In this case if double-buffering is used, read/write lock must apply; however, if we use triple-buffering, then using CAS is essentially possible.

Roughly speaking, we could say if a piece of logic can be separated into several inter-dependent states, each need only CAS, then such logic can be resolved by only CAS. But applying this in real problems seems much more complicated, and I do feel lack of a good methodology to divided and determine if such logic division is possible.

Please kindly share me your experiences or any methodologies I am not aware of.

are you asking if CAS be used to convert `any` lock-based concurrent algorithm to a lockfree or waitfree concurrent algorithm? — arunmoezhi, Apr 21 '14 at 09:25
No my point is: given a certain lock-based algorithm, is there a systematic way to determine if this can be converted into a purely CAS based waitfree algorithm. — Alex Suo, Apr 22 '14 at 08:13
If you are talking about universal construction, then yes. Any lock-based data structure can be made lock-free and eventually wait-free. But this universal construction is inefficient and is just used to prove that things can be done. If you want efficiency, then you have to custom design each algorithm. The paper `Impossibility and universality results for wait-free synchronization` by `Maurice Herlihy` discusses this — arunmoezhi, Apr 23 '14 at 22:19

johnnycrash · Answer 1 · 2014-05-16T01:01:23.300

Here is my laymans rule of thumb after many years of using atomics.

Your data must be co-located.
Your data must be small.
You don't need anything fancy.
You want fine grained locking.
You need speed.
Mix and match.

Co-located. To use atomics all the data that must change atomicly needs to be contiguous. So a double linked list is hard to impossible with atomics, because you have to update pointers on disparate nodes at the same time. A single linked list is trivial.

Small. Small means as small as the largest atomic operation allowed on your system. To use atomics to update 50 fields in a struct, you make new struct and then atomicly swap a pointer to it.

You don't need anything fancy. When working atomicly with single linked lists, you can only add and remove items from the front of the list. If you can do it with a hash table, a single linked list, a skip list or an array, you can use atomics. Atomics are great for building a struct and then atomicly swapping a pointer to that struct.

You want fine grained locking. With and atomic hash table, for example, you "lock" at the bucket level instead of the list level. I suppose you could have a mutex per bucket.

Speed. Mutexes are god awful slow...compared to atomics. Writing a memory allocator that locked a mutex every time you called malloc would suck. I benchmarked mutex vs atomic about 3 years ago and came up with a 40x slowdown.

Mix and match! I use Mutexes in places where atomics would be an utter pain in the ass and obfuscating. I try to limit those situations to code that doesn't execute much so I don't pay for the performance hit.

All these limitations being said I easily use atomics in all the hot spots and only have to fall back to mutexes rarely.

Determine if concurrency logic can be done using *only* CAS operations

1 Answers1

Determine if concurrency logic can be done using only CAS operations