I got a "bit" confused: In production we have two processes communicating via shared memory, a part of data exchange is a long and a bool. The access to this data is not synchronized. It's been working fine for a long time and still is. I know modifying a value is not atomic, but considering that these values are modified/accessed millions of times this had to fail?
Here is a sample piece of code, which exchanges a number between two threads:
#include <pthread.h>
#include <xmmintrin.h>
typedef unsigned long long uint64;
const uint64 ITERATIONS = 500LL * 1000LL * 1000LL;
//volatile uint64 s1 = 0;
//volatile uint64 s2 = 0;
uint64 s1 = 0;
uint64 s2 = 0;
void* run(void*)
{
register uint64 value = s2;
while (true)
{
while (value == s1)
{
_mm_pause();// busy spin
}
//value = __sync_add_and_fetch(&s2, 1);
value = ++s2;
}
}
int main (int argc, char *argv[])
{
pthread_t threads[1];
pthread_create(&threads[0], NULL, run, NULL);
register uint64 value = s1;
while (s1 < ITERATIONS)
{
while (s2 != value)
{
_mm_pause();// busy spin
}
//value = __sync_add_and_fetch(&s1, 1);
value = ++s1;
}
}
as you can see I have commented out couple things:
//volatile uint64 s1 = 0;
and
//value = __sync_add_and_fetch(&s1, 1);
__sync_add_and_fetch atomically increments a variable.
I know this is not very scientific, but running a few times without sync functions it works totally fine. Furthermore if I measure both versions sync and without sync they run at the same speed, how come __sync_add_and_fetch is not adding any overhead?
My guess is that compiler is guaranteeing atomicity for these operations and therefore I don't see a problem in production. But still cannot explain why __sync_add_and_fetch is not adding any overhead (even running in debug).
Some more details about mine environment: ubuntu 10.04, gcc4.4.3 intel i5 multicore cpu.
Production environment is similar it's just running on more powerful CPU's and on Centos OS.
thanks for your help