0

I have been having a strange error in Cuda with integer division, using the long long data type. Here's a condensed version of the code.

__global__ void Test(bool * d_test_list){

    long long index = threadIdx.x + blockIdx.x*blockDim.x;
    bool test = false;

    if (index / 25 == 5) //Somehow not true when index == 125? 
    {
        test = true;
    }

    d_test_list[index] = test;
}

After printing out all the elemtents of d_test_list, 125 does not show up, as well as any number in the range of [125,149] that should work. My only guess is that this has something to do with how Cuda handles integer types. A similar thing happens with the modulus, incorrect results, but (+, -, and *) all work great. I am using 1024 threads/Block, would that be an issue?

I am using Cuda v6.5 RC, but I'd assume they'd have integer division figured out by now.

Dane Bouchie
  • 421
  • 5
  • 11

1 Answers1

1

Figured it out. Used too many threads in a block. When I decreased it from 1024 to 200, it solved the problem. I think it has something to do with the amount of registers there are in a core, as well as division being implemented by software (if it is).

Update: The limit was 896 = 2^10-2^7 for division. For the modulus it was 768 = 2^10-2^8

Dane Bouchie
  • 421
  • 5
  • 11
  • 3
    Any time you are having trouble with a CUDA code, it's a good idea to add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). There are many questions on SO that deal with CUDA registers per thread limitations, you may also want to read about about [launch bounds](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds). Yes, division is implemented in software. – Robert Crovella Aug 13 '14 at 03:00
  • 1
    [This recent question](http://stackoverflow.com/questions/25140077/influence-of-division-operation-in-cuda-kernel-on-number-of-registers-per-thread) may be of interest. – Robert Crovella Aug 13 '14 at 15:02