Cuda instructions limit

Question

Is there a limit to how many instructions can be executed on a cuda thread? For example, if I run this code below, totalling about 10 million iterations of an empty loop, it never reaches the last printf(same result with a single loop with same number of iterations). However, if I shave off the two inner loops, leaving only 10.000 iterations, it does. What is the cause of this? :)

Shouldn't empty code be able to run forever? The kernel call and function looks like this:

//The call in main()
simStepGPU <<< 1, 128>>>();
cudaDeviceSynchronize();


__global__ void simStepGPU(Particle *array, int len) {

    //printf("START OF THREAD");
    for (int it = 0; it < 100; it++) {
        for (int it2 = 0; it2 < 100; it2++) {
            for (int it3 = 0; it3 < 100; it3++) {
                for (int it4 = 0; it4 < 100; it4++) {

                }
            }
        }
    }

    printf("END OF THREAD");
}

This is my first stackoverflow post, so please be nice.

Maybe you didn't wait long enough? That's a lot of iterations... — Jaa-c, Nov 11 '17 at 11:34
It's 100 million iterations. Shouldn't take very long on a GPU as it takes 2.5 seconds in Python to execute 100 million empty loops. — Carl, Nov 11 '17 at 11:38
I had some code executing after the cuda synchronization, so it seems like it terminates halfway through (forgot to write it but I also tried having milestone printouts in the loop, which never prints all the way). — Henron, Nov 11 '17 at 11:38

Cuda instructions limit

0 Answers0