Is there a limit to how many instructions can be executed on a cuda thread? For example, if I run this code below, totalling about 10 million iterations of an empty loop, it never reaches the last printf(same result with a single loop with same number of iterations). However, if I shave off the two inner loops, leaving only 10.000 iterations, it does. What is the cause of this? :)
Shouldn't empty code be able to run forever? The kernel call and function looks like this:
//The call in main()
simStepGPU <<< 1, 128>>>();
cudaDeviceSynchronize();
__global__ void simStepGPU(Particle *array, int len) {
//printf("START OF THREAD");
for (int it = 0; it < 100; it++) {
for (int it2 = 0; it2 < 100; it2++) {
for (int it3 = 0; it3 < 100; it3++) {
for (int it4 = 0; it4 < 100; it4++) {
}
}
}
}
printf("END OF THREAD");
}
This is my first stackoverflow post, so please be nice.