Atomic operation seems slower than Semaphore operations in multithreading shared process

Question

Can you think on any good reason why atomic operations seems slower than semaphores, even though there is a decrease on instructions?

Sample code:

 void increment(){
     if (strcmp(type, "ATOMIC") == 0) {
         for (int i = 0; i < RUN_TIME; ++i) {
            atomic_fetch_add_explicit(&count, 1, memory_order_relaxed);
        }
    }
     if (strcmp(type, "SEMAPHORE") == 0){
        for (int i = 0; i < RUN_TIME; ++i) {
            sem_wait(sem);
            count++;
            sem_post(sem);
        }
    }
}

Output:

   time ./CMAIN "SEMAPHORE";time ./CMAIN "ATOMIC";
 [C] SEMAPHORE, count 4000000

 real    0m0.039s
 user    0m0.029s
 sys     0m0.002s
[C] ATOMIC, count 4000000

 real    0m0.092s
 user    0m0.236s
 sys     0m0.003s

Have you tried with an actual multi-threaded application? This looks like it is performing serial operations so there is no resource contention and the semaphore never has to wait. — Christian Gibbons, Nov 10 '17 at 15:18
Without giving us compiler and platform, you can only expect some handwaving speculative answer. — Jens Gustedt, Nov 10 '17 at 16:54
'decrease in instructions'. You mean less lines of c code? Thats totally irrelevant. — pm100, Nov 10 '17 at 16:59
Is it really? If more instructions are pushed into the pipeline, and thus generating a higher total latency, is this irrelevant? — Bruno Miguel, Nov 23 '17 at 15:48

Petr Skocik · Answer 1 · 2017-11-10T17:07:59.897

Can't reproduce. For 10^9 iterations, I'm getting (from bash, i5, x86_64, Linux):

$ TIMEFORMAT="%RR %UU %SS"
$ gcc atomic.c -Os -lpthread && ( time ./a.out ATOMIC  ; time ./a.out  SEMAPHORE )
1.572R  1.568U  0.000S  #ATOMIC
5.542R  5.536U  0.000S  #SEMAPHORE

(About the same ratio for 4000000 iterations.)

My atomic.c (your example with the blanks filled in):

#include <stdio.h>
#include <string.h>
#include <stdatomic.h>
#include <semaphore.h>
#define RUN_TIME 100000000
char * type;
sem_t *sem;

_Atomic int count = ATOMIC_VAR_INIT(0);

 void increment(){
     if (strcmp(type, "ATOMIC") == 0) {
         for (int i = 0; i < RUN_TIME; ++i) {
            atomic_fetch_add_explicit(&count, 1, memory_order_relaxed);
        }
    }
     if (strcmp(type, "SEMAPHORE") == 0){
        for (int i = 0; i < RUN_TIME; ++i) {
            sem_wait(sem);
            count++;
            sem_post(sem);
        }
    }
}

int main(int C, char**V)
{
    sem_t s;
    sem_init(&s, 0, 1);
    sem = &s;
    type = V[1];
    increment();
}

Please post an mcve, along with your platform specs.

score 0 · Answer 2 · answered Nov 10 '17 at 15:14

It shouldn't because what I read is that "in semaphore When some process is trying to access semaphore which is not available, semaphore puts process on wait queue(FIFO) and puts task on sleep, it's more time consuming or more overheads for CPU rather than Atomic operations.

normally atomic operation will perform faster because it will load, update & modify instruction all together. But Atomic operation are CPU specific i.e n++ will executed in single instruction (INC) or not, always can't guarantee. So it's upto CPU to decide, May be because of this reason you are getting output like this.

What I understood I wrote, suggestion will be appreciated.

No, on any decent implementation, as long as you are on the "fast path", semaphores (and mutexes and stuff) are no slower than atomics. Then only go into kernel wait or similar if there is congestion. — Jens Gustedt, Nov 10 '17 at 16:53

Atomic operation seems slower than Semaphore operations in multithreading shared process

2 Answers2