What's the compative cost of an atomic RMW operation and a function call?

Question

My understanding is that atomic machine instructions may be up to two orders of magnitude slower than a non-atomic operation. For example, given

int x;
x++

and

std::atomic<int> y;
y++;

my understanding is that x++ typically runs much faster than y++. (I'm assuming that the increment operation maps to an underlying machine instruction. I'm sure the exact comparative cost varies from architecture to architecture, but I'm talking about a rule of thumb.)

I'm interested in the relative cost of an atomic RMW operation and a non-inline function call, again, as a general rule of thumb. For example, given this non-inline function,

void f(){}

what can we generally say about the cost of y++ (i.e., the atomic increment) compared to the cost of executing a non-inline call to f?

My motivation is to try to put the common claim that "atomic operations are much more expensive than non-atomic operations" in perspective. One way to do that is to try to get some idea how expensive an atomic RMW operation is compared to calling and returning from a non-inline function.

Please don't reply with "the only way to know is to measure." I'm not asking about an atomic RMW operation versus a function call in a particular context on a particular architecture. I'm asking about a general rule of thumb that could be used as the basis of discussion for people who might think, "We can never use atomic instructions, because they're too expensive," yet who wouldn't think twice about making function calls.

*"I'm asking about a general rule of thumb ..."*. I *believe* **the general rule** is that there is NO GENERAL RULE. The machine architecture and the instructions vary a lot; they've been modified/changed/improved a lot in the past and still continue to change/improve. — Nawaz, Feb 25 '14 at 05:44
for any function call I think instruction pointer will change to location to memory address to start of instruction set for a function that and then it loads instruction set in memory and then executes same instruction i.e. x++ and then returns that value and then again jumps back to previous instruction address where it left. so according to me atomic functions/ inline function calls are slower by around 2-5x than normal operators. correct me if I am wrong. — Nachiket Kate, Feb 25 '14 at 05:45
I bench marked a swap of a small array in assembly, comparing the sequence mov eax,array1, xchg eax,array2, mov array1, eax, with the sequence mov eax,array1, mov ebx,array2, mov array1,ebx, mov array2,eax. Note that xchg is an auto-lock instruction. The xchg sequence took 12.5 times as long as the mov sequence on an Intel 2600K cpu. — rcgldr, Feb 25 '14 at 09:02

score -1 · Answer 1 · answered Dec 12 '19 at 05:07

The question as asked has issues.

One is that you use pseudo code syntax that doesn't have a clear storage class and that seems to operate on local objects. A local atomic object is meaningless. Atomic operations are for objects shared by different threads.

A compiler could well notice that a non volatile local variable is used only in a function and not generate any special atomic operation (I don't know of any compiler that does presently does that).

We have to assume that the object is not local (or is volatile).

The cost of any memory operation depends a lot on caching. If the location is not in our cache the operation will be much more costly.

All the end of stack (the most recent part) is almost always in our cache.

By definition the value of shared objects must travel between caches (they are modified or read by multiple threads).

So what are you really comparing here? Until you say precisely, the question can't be answered.

What's the compative cost of an atomic RMW operation and a function call?

1 Answers1