theano fails to compile cuda but the python code runs using GPU

Question

I am trying to run a theano simple code on Ubuntu 16.04 with Cuda 8.0 on NVIDIA 1060 GPU within a python virtual environment created by anaconda. The following is my theanorc file:

[global]
floatX = float32
device = cuda

The code I am trying to run is a short sample from theano website:

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

And when I run the code I get a bunch of warnings and the following error:

ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/sandbox/cuda -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/numpy/core/include -I/home/eb/anaconda2/envs/deep/include/python2.7 -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/gof -L/home/eb/anaconda2/envs/deep/lib -o /home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray/cuda_ndarray.so mod.cu -lcublas -lpython2.7 -lcudart')
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
/tmp/try_flags_M8OZOh.c:4:19: fatal error: cudnn.h: No such file or directory
compilation terminated.

Mapped name None to device cuda: GeForce GTX 1060 6GB (0000:01:00.0)

Surprisingly, the code RUNS and print the desired output as follows:

[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.365814 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

I was wondering if I am missing a Theano config or something? Any idea on what's going wrong?

p.s. All libraries have been installed in my python virtual env except for Cuda library which is installed on system level. --Thanks

In case it helps as a clue. The output shows the source code for _CUDA_NDARRAY_C marked in red. and then the warnings such as: In file included from /usr/include/host_config.h:161:0, from /usr/include/cuda_runtime.h:76, from :0: /usr/include/features.h:169:0: note: this is the location of the previous definition # define _XOPEN_SOURCE 700 ^ mod.cu(940): warning: pointless comparison of unsigned integer with zero mod.cu(3000): warning: conversion from a string literal to "char *" is deprecated — Mohammadreza, Jun 27 '17 at 17:32
I think I need to install CudNN as well! IN theano website there is no requirement for it. But probably the ndarray library attempts to use it first and that's why I get the error. — Mohammadreza, Jun 27 '17 at 18:05
So here is how I was able to finally solve the problem after many trial-and-errors. First, the above error complains about the lack of CudNN library which Theano and Keras use. But installing the CudNN package via dpkg command did not work for me and I had to manualy copy "cudnn.h" and other files into "include" and "lib64" folder of my cuda installation. Now the warning and errors are both gone. So as far as I understood, Theano first tries to use CudNN and if its not there it throws some error and uses the standard Cuda library instead. That's why the code eventually runs in both cases. — Mohammadreza, Jun 28 '17 at 19:01

theano fails to compile cuda but the python code runs using GPU

0 Answers0