-2

I have the following code snippet:

__constant__ int baseLineX[4000];
__constant__ int baseLineY[4000];
__constant__ int guideLineX[4000];
__constant__ int guideLineY[4000];
__constant__ int rectangleOffsets[8];

__constant__ float blurKernel[64];

<other code>

for(int i = 0; i < 8; i++)
    hostRectangleOffsets[i] = i;

cudaMemcpyToSymbol(rectangleOffsets, hostRectangleOffsets, 8*sizeof(int));

This code works fine on a Tesla K40 but not on a 16GB Tesla V100. (Even my laptop can run the code with a 4GB Quaddro M2200 GPU).

Code just hangs on the V100 and never returns from the cudaMemcpyToSymbol call but looks like it's still being processed on the GPU. Any ideas?

Aaron
  • 57
  • 5

1 Answers1

-2

Well, you haven't provided a Minimal, complete, verifiable example: Your code doesn't compile and is missing statements, yet has (apparently) irrelevant statements. So - nobody can actually check.

I can still make several suggestions though:

  1. Try using the asynchronous version of this call: cudaMemcpyToSymbolAsync(). At least your program won't hang...
  2. Run your program or app in a debugger to begin with (e.g. nVIDIA's nSight on most systems, or their extension to Visual Studio on Windows); alternatively, attach a debugger to the hanging process (MSVS instructions, Eclipse instructions - old).
  3. Run the process with core dump enabled (if you're on a Unix'ish system), kill it when it hangs, then open the core dump in a debugger and you'll at least get the back-trace
  4. Try rebuilding your program with less optimizations enabled - this sometimes helps, at least for diagnostical purposes (this can be combined with the previous suggestions).
einpoklum
  • 118,144
  • 57
  • 340
  • 684