2

I want to execute a CUDA kernel in python using Numbapro API. I have this code:

import math
import numpy
from numbapro import jit, cuda, int32, float32
from matplotlib import pyplot

@cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
    start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
    end    = N
    stride = cuda.gridDim.x * cuda.blockDim.x
    for i in range(start, end, stride):
        u_source[i] = strength_source/(2*math.pi) * (X[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)
        v_source[i] = strength_source/(2*math.pi) * (Y[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)


def main():
    N = 200                                # number of points in each direction
    x_start, x_end = -4.0, 4.0            # boundaries in the x-direction
    y_start, y_end = -2.0, 2.0            # boundaries in the y-direction
    x = numpy.linspace(x_start, x_end, N)    # creates a 1D-array with the x-coordinates
    y = numpy.linspace(y_start, y_end, N)    # creates a 1D-array with the y-coordinates

    X, Y = numpy.meshgrid(x, y)              # generates a mesh grid

    strength_source = 5.0                      # source strength
    x_source, y_source = -1.0, 0.0             # location of the source

    start = timer()

    #calculate grid dimensions
    blockSize = 1024
    gridSize  = int(math.ceil(float(N)/blockSize))

    #transfer memory to device
    X_d        = cuda.to_device(X)
    Y_d        = cuda.to_device(Y)
    u_source_d = cuda.device_array_like(X)
    v_source_d = cuda.device_array_like(Y)

    #launch kernel
    calculate_velocity_field[gridSize,blockSize](X_d,Y_d,u_source_d,v_source_d,x_source,y_source,strength_source,N)

    #transfer memory to host
    u_source = numpy.empty_like(X)
    v_source = numpy.empty_like(Y)
    u_source_d.to_host(u_source)
    v_source_d.to_host(v_source)

    elapsed_time = timer() - start
    print("Exec time with GPU %f s" % elapsed_time)

if __name__ == "__main__":
    main()

Is giving me this error:

NvvmError                                 Traceback (most recent call last)
<ipython-input-17-85e4a6e56a14> in <module>()
----> 1 @cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
      2 def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
      3     start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
      4     end    = N
      5     stride = cuda.gridDim.x * cuda.blockDim.x

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/decorators.py in kernel_jit(func)
     89             # Force compilation for the current context
     90             if bind:
---> 91                 kernel.bind()
     92 
     93             return kernel

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in bind(self)
    319         Force binding to current CUDA context
    320         """
--> 321         self._func.get()
    322 
    323     @property

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    254         cufunc = self.cache.get(device.id)
    255         if cufunc is None:
--> 256             ptx = self.ptx.get()
    257 
    258             # Link

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    226             arch = nvvm.get_arch_option(*cc)
    227             ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 228                                    **self._extra_options)
    229             self.cache[cc] = ptx
    230             if config.DUMP_ASSEMBLY:

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in llvm_to_ptx(llvmir, **opts)
    420     cu.add_module(llvmir.encode('utf8'))
    421     cu.add_module(libdevice.get())
--> 422     ptx = cu.compile(**opts)
    423     return ptx
    424 

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in compile(self, **options)
    211                                           for x in opts])
    212         err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
--> 213         self._try_error(err, 'Failed to compile\n')
    214 
    215         # get result

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in _try_error(self, err, msg)
    229 
    230     def _try_error(self, err, msg):
--> 231         self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
    232 
    233     def get_log(self):

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in check_error(self, error, msg, exit)
    118                 sys.exit(1)
    119             else:
--> 120                 raise exc
    121 
    122 

NvvmError: Failed to compile

libnvvm : error: -arch=compute_52 is an unsupported option
NVVM_ERROR_INVALID_OPTION

I tried another numbapro examples and the same error ocurrs. I don't know if it's a bug of Numbapro that doesn't support 5.2 compute capability or it's a problem of Nvidia NVVM... suggestions?

In theory it should be supported, but I don't know what is happening.

I'm using Linux with CUDA 7.0 and driver version 346.29

Braiam
  • 1
  • 11
  • 47
  • 78
Hopobcn
  • 885
  • 10
  • 20
  • Are you sure that you only have CUDA 7.0 installed and no other CUDA toolkit? Or that Anaconda hasn't installed its own version of CUDA as well. Compute capability 5.2 devices are supported in the CUDA 7.0 release toolkit, but not in earlier CUDA versions. That error suggests you are using an earlier toolkit. You might need to talk to Continuum Analytics about this – talonmies Jul 17 '15 at 08:52
  • Yes I have CUDA 7.0 installed. I suspect that Anaconda supplies his own version of CUDA.. if I execute "check_cuda()" it shows paths like that: ~/.anaconda3/lib/libcublas.so.6.0.37 – Hopobcn Jul 17 '15 at 10:50
  • OK so you are actually using CUDA 6, which doesn't support Maxwell cards. You probably need to talk to the vendor about how to use CUDA 7 with their product – talonmies Jul 17 '15 at 11:17

1 Answers1

3

Finally I found a solution here

  • Solution 1:

conda update cudatoolkit

Fetching package metadata: ....
# All requested packages already installed.
# packages in environment at ~/.anaconda3:
#
cudatoolkit               6.0                          p0

It looks like me updating the CUDA toolkit doesn't update to CUDA 7.0. A second solution can be done:

  • Solution 2

conda install -c numba cudatoolkit

Fetching package metadata: ...... 
Solving package specifications: .
Package plan for installation in environment ~/.anaconda3:

The following packages will be downloaded:
     package                    |            build
    ---------------------------|-----------------
    cudatoolkit-7.0            |                1       190.8 MB

The following packages will be UPDATED:

    cudatoolkit: 6.0-p0 --> 7.0-1

Proceed ([y]/n)? y

Before:

In [4]: check_cuda()
------------------------------libraries detection-------------------------------
Finding cublas
    located at ~/.anaconda3/lib/libcublas.so.6.0.37
    trying to open library...   ok
Finding cusparse
    located at ~/.anaconda3/lib/libcusparse.so.6.0.37
    trying to open library...   ok
Finding cufft
    located at ~/.anaconda3/lib/libcufft.so.6.0.37
    trying to open library...   ok
Finding curand
    located at ~/.anaconda3/lib/libcurand.so.6.0.37
    trying to open library...   ok
Finding nvvm
    located at ~/.anaconda3/lib/libnvvm.so.2.0.0
    trying to open library...   ok
    finding libdevice for compute_20... ok
    finding libdevice for compute_30... ok
    finding libdevice for compute_35... ok
-------------------------------hardware detection-------------------------------
Found 1 CUDA devices
id 0      b'GeForce GTX 970'                              [SUPPORTED]
                      compute capability: 5.2
                           pci device id: 0
                              pci bus id: 7
Summary:
    1/1 devices are supported
PASSED
Out[4]: True

After:

In [6]:  check_cuda()
------------------------------libraries detection-------------------------------
Finding cublas
    located at ~/.anaconda3/lib/libcublas.so.7.0.28
    trying to open library...   ok
Finding cusparse
    located at ~/.anaconda3/lib/libcusparse.so.7.0.28
    trying to open library...   ok
Finding cufft
    located at ~/.anaconda3/lib/libcufft.so.7.0.35
    trying to open library...   ok
Finding curand
    located at ~/.anaconda3/lib/libcurand.so.7.0.28
    trying to open library...   ok
Finding nvvm
    located at ~/.anaconda3/lib/libnvvm.so.3.0.0
    trying to open library...   ok
    finding libdevice for compute_20... ok
    finding libdevice for compute_30... ok
    finding libdevice for compute_35... ok
-------------------------------hardware detection-------------------------------
Found 1 CUDA devices
id 0      b'GeForce GTX 970'                              [SUPPORTED]
                      compute capability: 5.2
                           pci device id: 0
                              pci bus id: 7
Summary:
    1/1 devices are supported
PASSED
Out[6]:  True
talonmies
  • 70,661
  • 34
  • 192
  • 269
Hopobcn
  • 885
  • 10
  • 20