When I diassembled my program, I saw that gcc was using jmp for the second pthread_wait_barrier call when compiled with -O3. Why is it so?
What advantage does it get by using jmp instead of call. What tricks the compiler is playing here? I guess its performing tail call optimization here.
By the way I'm using static linking here.
__attribute__ ((noinline)) void my_pthread_barrier_wait(
volatile int tid, pthread_barrier_t *pbar )
{
pthread_barrier_wait( pbar );
if ( tid == 0 )
{
if ( !rollbacked )
{
take_checkpoint_or_rollback( ++iter == 4 );
}
}
//getcontext( &context[tid] );
SETJMP( tid );
asm("addr2jmp:");
pthread_barrier_wait( pbar );
// My suspicion was right, gcc was performing tail call optimization,
// which was messing up with my SETJMP/LONGJMP implementation, so here I
// put a dummy function to avoid that.
dummy_var = dummy_func();
}