0

I'm trying to implement a simple multiplication operation using cuda\Numba. I'm encountering an odd problem in Numba, and I hope you can help me in someway.

This are the line of code I'm using:

import numpy as np
from timeit import default_timer as timer
from numba import jit, guvectorize, int32, int64, float64
from numba import cuda

@guvectorize(['void(int32[:,:], int32[:,:])'], '(m,n)->(m,n)', target='cuda', nopython=True)
def f_vec_loops(x, ret):
    nx = len(ret)
    ny = len(ret[0])
    for k in range(1000):
        for i in range(nx):
            for j in range(ny):
                ret[i, j] += x[i, j]


x = 300
y = 400    
a = np.ones([x, y], dtype='int32')
ret = np.zeros([x, y], dtype='int32')

a_cuda = cuda.to_device(a)
ret_cuda = cuda.to_device(ret)

s = timer()
f_vec_loops(a_cuda, ret_cuda)
e = timer()
print(e-s)

hary = ret_cuda.copy_to_host()
print(hary)

It works well for small values of x and y, i.e 30/40. However, when I increase the values, such as 300/400 in the above code, I receive this error:

Traceback (most recent call last):
  File "test.py", line 29, in <module>
    hary = ret_cuda.copy_to_host()
  File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
    return fn(*args, **kws)
  File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 237, in copy_to_host
    _driver.device_to_host(hostary, self, self.alloc_size, stream=stream)
  File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1606, in device_to_host
    fn(host_pointer(dst), device_pointer(src), size, *varargs)
  File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 288, in safe_cuda_api_call
    self._check_error(fname, retcode)
  File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 323, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemcpyDtoH results in UNKNOWN_CUDA_ERROR

It seems that the problem is *.copy_to_host(). I have to admit, I don't understand where is the problem. I working on a Windows 10 PC, with an Nvidia GeForce GTX 970.

Thanks in advance for any help.

0 Answers0