0

Is there a limit to how many instructions can be executed on a cuda thread? For example, if I run this code below, totalling about 10 million iterations of an empty loop, it never reaches the last printf(same result with a single loop with same number of iterations). However, if I shave off the two inner loops, leaving only 10.000 iterations, it does. What is the cause of this? :)

Shouldn't empty code be able to run forever? The kernel call and function looks like this:

//The call in main()
simStepGPU <<< 1, 128>>>();
cudaDeviceSynchronize();


__global__ void simStepGPU(Particle *array, int len) {

    //printf("START OF THREAD");
    for (int it = 0; it < 100; it++) {
        for (int it2 = 0; it2 < 100; it2++) {
            for (int it3 = 0; it3 < 100; it3++) {
                for (int it4 = 0; it4 < 100; it4++) {

                }
            }
        }
    }

    printf("END OF THREAD");
}

This is my first stackoverflow post, so please be nice.

Henron
  • 1
  • Maybe you didn't wait long enough? That's a lot of iterations... – Jaa-c Nov 11 '17 at 11:34
  • It's 100 million iterations. Shouldn't take very long on a GPU as it takes 2.5 seconds in Python to execute 100 million empty loops. – Carl Nov 11 '17 at 11:38
  • I had some code executing after the cuda synchronization, so it seems like it terminates halfway through (forgot to write it but I also tried having milestone printouts in the loop, which never prints all the way). – Henron Nov 11 '17 at 11:38

0 Answers0