I am very new to CUDA programming and am starting off with PyCUDA to get the basics. I studied the tutorials and have run a couple of simple test codes. The tests used only 1D arrays. When I tried to run the following code with 2D arrays, I am continuously getting a PyCUDA warning saying that the clean-up operation failed due to misaligned address.
import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply(float **dest) {
const int i = threadIdx.x;
const int j = threadIdx.y;
dest[i][j] = 2.0*dest[i][j];
}
""")
a = np.random.randn(32, 32).astype(np.float32)
multiply = mod.get_function("multiply")
multiply(drv.InOut(a), block=(32,32,1), grid=(1,1))
print(a)
The error that I get when I run the above script is:
Traceback (most recent call last):
File "cudaTest.py", line 16, in <module>
multiply(drv.InOut(a), block=(32,32,1), grid=(1,1))
File "/users/gpu/local/python3.3/lib/python3.6/site-packages/pycuda-2016.1.2-py3.6-linux-x86_64.egg/pycuda/driver.py", line 405, in function_call
Context.synchronize()
pycuda._driver.LogicError: cuCtxSynchronize failed: misaligned address
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: misaligned address
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: misaligned address
I have checked other questions on SO and found a similar one. Following the answer given there, I tried specifying the size of the array a
, but to no avail.
I am running this on a cluster with two nVidia Tesla K10 GPUs. Since I have no root access, I had to install Python3 locally and add numpy, pyCUDA etc to the local installation. The cluster runs on Ubuntu 12.04.1 LTS. I am using Python 3.6.0 with PyCUDA 2016.1.2 and CUDA 6.0