0

The following program used the implementation of atomic locks from 'Cuda By Example', but running the program makes my machine frozen. Can someone tell me what's wrong with my program? Thanks a lot

Yifei

#include <stdio.h>


__global__ void test()
{
    __shared__ int i, mutex;

    if (threadIdx.x == 0) {
       i = 0;
       mutex = 0;
    }
    __syncthreads();

    while( atomicCAS(&mutex, 0, 1) != 0);
    i++;
    printf("thread %d: %d\n", threadIdx.x, i);
    atomicExch(&mutex,0);
}
einpoklum
  • 118,144
  • 57
  • 340
  • 684
user11869
  • 1,083
  • 2
  • 14
  • 29
  • The above program is just an example of what I'm trying to do. I know I can use atomicAdd in the above case to increment i. However, the critical section is more complicated in my real case – user11869 Jan 18 '12 at 21:30

1 Answers1

3

Here is a theory. I hope that you are familiar with the concept of a warp. In the while loop all threads within a warp will enter the while loop. Only one will exit and the rest of the threads will reside inside the while loop. This will introduce a divergent branch making the thread that exited the while loop stall until the branch converges again. Because this thread is the only one that can release the mutex this will never happen because it waits for the other threads do converge.

brano
  • 2,822
  • 19
  • 15