I have the following code snippet:
__constant__ int baseLineX[4000];
__constant__ int baseLineY[4000];
__constant__ int guideLineX[4000];
__constant__ int guideLineY[4000];
__constant__ int rectangleOffsets[8];
__constant__ float blurKernel[64];
<other code>
for(int i = 0; i < 8; i++)
hostRectangleOffsets[i] = i;
cudaMemcpyToSymbol(rectangleOffsets, hostRectangleOffsets, 8*sizeof(int));
This code works fine on a Tesla K40 but not on a 16GB Tesla V100. (Even my laptop can run the code with a 4GB Quaddro M2200 GPU).
Code just hangs on the V100 and never returns from the cudaMemcpyToSymbol call but looks like it's still being processed on the GPU. Any ideas?