CUDA cuCtxSynchronize Error using pyCuda 2016.1.2

Question

I have a pretty simple pycuda script here that's supposed to load in a grayscale image of a truck (2048x1365), invert the colors, and save it back;

import pycuda.autoinit
import pycuda.driver as device
from pycuda.compiler import SourceModule as cpp

import numpy as np
import cv2

modify_image = cpp("""
__global__ void modify_image(int pixelcount, int* inputimage, int* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}
""").get_function("modify_image")

print("Loading image")

image = cv2.imread("truck.jpg", cv2.IMREAD_GRAYSCALE)

print("Processing image")

pixels = image.shape[0] * image.shape[1]
output = np.zeros_like(image)
modify_image(
  device.In(np.int32(pixels)),
  device.In(image), 
  device.Out(output),
  block=(1024,1,1), grid=(pixels // 1024, 1))

print("Saving image")

cv2.imwrite("processed.png", output)

print("Done")

However, when trying to run it, I get a cuCtxSynchronize Error during execution. I did some small modification to the code and futher testing (https://hastebin.com/fucugucawe.py), and found out that it works fine up to a certain level, but trying to allocate any more than around 420,000 bytes causes the error to occur.

I used MSI Afterburner to monitor my gpu's memory usage, and it never goes above 1GB, even when running the program. My GPU is a GTX 980 with 4GB of VRAM, so I shouldn't even be close to it's limit. If anybody knows what's going on here, I would really appreciate the help.

Here is the exact output produced by the program;

Loading image
Processing image
Traceback (most recent call last):
  File "<path to source file>\imfiltertest.py", line 36, in <module>
    block=(1024,1,1), grid=(pixels // 1024, 1))
  File "C:\Users\<me>\AppData\Local\Programs\Python\Python36\lib\site-packages\pycuda\driver.py", line 405, in function_call
    Context.synchronize()
pycuda._driver.LaunchError: cuCtxSynchronize failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: unspecified launch failure
[Finished in 0.7s]

I've already attempted disabling window's TDR, but it had no effect on the issue.

"cuCtxSynchronize Error" is a runtime error, not a compiler error. — talonmies, May 18 '17 at 21:05
may be hitting a wddm tdr timeout. Anyway, you are supposed to provide a [mcve] in the question itself, not in an external link. — Robert Crovella, May 18 '17 at 21:10
Okay, I replaced the first link with the actual code. What do you mean by a wddm tdr timeout, and how might I work around it? — Maurdekye, May 18 '17 at 21:12
If it's anything to do with the runtime of the program, I noticed that it was 0.7 seconds when it ran successfully, and 0.8 when it failed. But 0.8 seconds seems like an absurdly short timeout interval. — Maurdekye, May 18 '17 at 21:14
I went into the registry and added the key `Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel` set to 0. But the error still occurs in the same manner. — Maurdekye, May 18 '17 at 21:26
if your pixels are bytes or anything less than an `int` quantity (`IMREAD_GRAYSCALE` converts your image into a single channel!), your code will perform out-of-bounds accesses. Try converting the kernel code to use `unsigned char *` pointer parameters rather than `int *` — Robert Crovella, May 19 '17 at 03:39

CUDA cuCtxSynchronize Error using pyCuda 2016.1.2

0 Answers0