1

I try to execute Bessel functions( J0(x) as example) using CUDA. Heres the formula enter image description here

I try to get result to within some epsilon value. So here's the code

    __device__ void Bessel_j0(int totalBlocks, int totalThreads, float z, float epsilon, float* result){
            int n = 1;
            *result = 0;

            bool epsilonFlag = true;

            int idx_start;
            int idx_end;

            while(epsilonFlag == true){ 
                initThreadBounds(&idx_start, &idx_end, n, totalBlocks, totalThreads);
                float a_k;
                for (int k = idx_start; k < idx_end; k++) {
                    a_k = m_power((-0.25 * z * z), k)/(m_factorial(k) * m_factorial(k)); 
                    *result += a_k;
                }
                if(a_k < epsilon){
                        epsilonFlag = false;
                }
                n++;
            }
        }

__global__ void J0(int totalBlocks, int totalThreads,  float x, float* result){
        float res = 0;

        Bessel_j0(totalBlocks, totalThreads, 10, 0.01, &res);
        result[(blockIdx.x*totalThreads + threadIdx.x)] = res;
}

__host__ void J0test(){

    const int blocksNum = 32;
    const int threadNum = 32;

    float   *device_resultf; //для устройства
    float   host_resultf[threadNum*blocksNum]   ={0};


    cudaMalloc((void**) &device_resultf, sizeof(float)*threadNum*blocksNum);

    J0<<<blocksNum, threadNum>>>(blocksNum, threadNum, 10, device_resultf); 
    cudaThreadSynchronize();

    cudaMemcpy(host_resultf, device_resultf, sizeof(float)*threadNum*blocksNum, cudaMemcpyDeviceToHost);

    float sum = 0;

    for (int i = 0; i != blocksNum*threadNum; ++i) {
        sum += host_resultf[i];
        printf ("result in %i cell = %f \n", i, host_resultf[i]);
    }
    printf ("Bessel res = %f \n", sum);
    cudaFree(device_resultf);
}
int main(int argc, char* argv[])
{
    J0test();   
}

When I run it appears black screen and Windows says that nVidia driver didn't respond and it recovered it. And in console output there are only zeros in host_resultf array. What's wrong? How can I execute properly functions to within some epsilon?

DanilGholtsman
  • 2,354
  • 4
  • 38
  • 69
  • 1
    Your program is taking too long to execute and you are hitting a windows TDR event. This unloads the GPU driver which halts your program execution. You should see if your program has some logic error causing it to take a very long time, or else see if you can shorten kernel execution. You may also want to try running it on a machine where the CUDA GPU is not also a WDDM GPU running a display under windows. Kernels running on WDDM GPUs are limited to about 2 seconds of execution time before you hit a windows TDR event. – Robert Crovella Apr 07 '13 at 14:56
  • 3
    Please note that the CUDA library already provides the Bessel of the first kind j0, j1, jn, as well as the Bessel functions of the second kind y0, y1, yn. – njuffa Apr 07 '13 at 17:42
  • @njuffa yeah I know, but i got task to made it by myself -_- – DanilGholtsman Apr 07 '13 at 18:04
  • 2
    In that case, you may want to consider strength reducing the series computation to significantly cut down on computation time, for example by in the case of j0 computing a_(k+1) from a_k, by multiplying with -0.25*z*z and dividing by k*k. – njuffa Apr 07 '13 at 18:42
  • @njuffa oh thanks for advice! btw is there any reason why theres always j0, j1 and jv function when we could use just jv for all cases? – DanilGholtsman Apr 08 '13 at 15:26
  • 1
    The CUDA math library follows established standards (such as C99) whereever possible. POSIX defines this group of Bessel functions. I assume it has something to do with the fact that functions of order 0 and 1 can serve as the basis of functions of higher orders, via a well-known recursion. – njuffa Apr 08 '13 at 17:03

1 Answers1

1

It is unlikely, but may be your kernel execution hit the allowable kernel execution time limit. Your code doesn't show an upper limit for iteration numbers. It can happen that epsilon is never reached and your kernel keeps executing beyond the time limits. This site can help.

In all cases, I will add an upper limit to the epsilon loop, never leave a code to run without limit on the iterations number.

Bichoy
  • 351
  • 2
  • 9