cudaMemcpyToSymbol just hangs and never returns. GPU processing at 100%. Code works fine on K40 but not on V100

Question

I have the following code snippet:

__constant__ int baseLineX[4000];
__constant__ int baseLineY[4000];
__constant__ int guideLineX[4000];
__constant__ int guideLineY[4000];
__constant__ int rectangleOffsets[8];

__constant__ float blurKernel[64];

<other code>

for(int i = 0; i < 8; i++)
    hostRectangleOffsets[i] = i;

cudaMemcpyToSymbol(rectangleOffsets, hostRectangleOffsets, 8*sizeof(int));

This code works fine on a Tesla K40 but not on a 16GB Tesla V100. (Even my laptop can run the code with a 4GB Quaddro M2200 GPU).

Code just hangs on the V100 and never returns from the cudaMemcpyToSymbol call but looks like it's still being processed on the GPU. Any ideas?

Maybe problem is size of int on different platforms and GPUs? — huseyin tugrul buyukisik, Jun 18 '18 at 13:05

score -2 · Answer 1 · answered Jun 18 '18 at 23:05

Well, you haven't provided a Minimal, complete, verifiable example: Your code doesn't compile and is missing statements, yet has (apparently) irrelevant statements. So - nobody can actually check.

I can still make several suggestions though:

Try using the asynchronous version of this call: cudaMemcpyToSymbolAsync(). At least your program won't hang...
Run your program or app in a debugger to begin with (e.g. nVIDIA's nSight on most systems, or their extension to Visual Studio on Windows); alternatively, attach a debugger to the hanging process (MSVS instructions, Eclipse instructions - old).
Run the process with core dump enabled (if you're on a Unix'ish system), kill it when it hangs, then open the core dump in a debugger and you'll at least get the back-trace
Try rebuilding your program with less optimizations enabled - this sometimes helps, at least for diagnostical purposes (this can be combined with the previous suggestions).

cudaMemcpyToSymbol just hangs and never returns. GPU processing at 100%. Code works fine on K40 but not on V100

1 Answers1