Why does Theano print "cc1plus: fatal error: cuda_runtime.h: No such file or directory"?

Question

I am trying to use the GPU with Theano. I've read this tutorial.

However, I can't get theano to use the GPU and I don't know how to continue.

Testing machine

$ cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" - Kernel \r (\l).
$ nvidia-smi -L
GPU 0: Tesla C2075 (S/N: 0324111084577)
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-5.0/lib64:[other]:/usr/local/lib:/usr/lib:/usr/local/X11/lib:[other]
$ find /usr/local/ -name cuda_runtime.h
/usr/local/cuda-5.0/include/cuda_runtime.h
$ echo $C_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ echo $CXX_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ nvidia-smi -a
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Failed to initialize NVML: Insufficient Permissions
$ echo $PATH
/usr/lib64/mpi/gcc/openmpi/bin:/home/mthoma/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:.:/home/mthoma/bin
$ ls -l /dev/nv*
crw-rw---- 1 root video 195,   0  1. Jul 09:47 /dev/nvidia0
crw-rw---- 1 root video 195, 255  1. Jul 09:47 /dev/nvidiactl
crw-r----- 1 root kmem   10, 144  1. Jul 09:46 /dev/nvram
# nvidia-smi -a

==============NVSMI LOG==============

Timestamp                       : Wed Jul 30 05:13:52 2014
Driver Version                  : 304.33

Attached GPUs                   : 1
GPU 0000:04:00.0
    Product Name                : Tesla C2075
    Display Mode                : Enabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 0324111084577
    GPU UUID                    : GPU-7ea505ef-ad46-bb24-c440-69da9b300040
    VBIOS Version               : 70.10.46.00.05
    Inforom Version
        Image Version           : N/A
        OEM Object              : 1.1
        ECC Object              : 2.0
        Power Management Object : 4.0
    PCI
        Bus                     : 0x04
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x109610DE
        Bus Id                  : 0000:04:00.0
        Sub System Id           : 0x091010DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : 30 %
    Performance State           : P12
    Clocks Throttle Reasons     : N/A
    Memory Usage
        Total                   : 5375 MB
        Used                    : 39 MB
        Free                    : 5336 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 5 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
        Aggregate
            Single Bit            
                Device Memory   : 133276
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 133276
            Double Bit            
                Device Memory   : 203730
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 203730
    Temperature
        Gpu                     : 58 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 33.83 W
        Power Limit             : 225.00 W
        Default Power Limit     : N/A
        Min Power Limit         : N/A
        Max Power Limit         : N/A
    Clocks
        Graphics                : 50 MHz
        SM                      : 101 MHz
        Memory                  : 135 MHz
    Applications Clocks
        Graphics                : N/A
        Memory                  : N/A
    Max Clocks
        Graphics                : 573 MHz
        SM                      : 1147 MHz
        Memory                  : 1566 MHz
    Compute Processes           : None

Cuda sample

Compiling and executing worked as a super user (tested with cuda/C/0_Simple/simpleMultiGPU):

# ldconfig /usr/local/cuda-5.0/lib64/
# ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA-capable device count: 1
Generating input data...

Computing with 1 GPUs...
  GPU Processing time: 27.814000 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
  GPU sum: 16777296.000000
  CPU sum: 16777294.395033
  Relative difference: 9.566307E-08 

[simpleMultiGPU] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

When I try this as normal user, I get:

$ ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA error at simpleMultiGPU.cu:87 code=38(cudaErrorNoDevice) "cudaGetDeviceCount(&GPU_N)" 
CUDA-capable device count: 0
Generating input data...

Floating point exception

How can I get cuda to work with non-super users?

Testing code

The following code is from "Testing Theano with GPU"

#!/usr/bin/env python
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

The error message

The complete error message is much too long to post it here. A longer version is on http://pastebin.com/eT9vbk7M, but I think the relevant part is:

cc1plus: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -g -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC -Xlinker -rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray -Xlinker -rpath,/usr/local/cuda-5.0/lib -Xlinker -rpath,/usr/local/cuda-5.0/lib64 -I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda -I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -o /home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so mod.cu -L/usr/local/cuda-5.0/lib -L/usr/local/cuda-5.0/lib64 -L/usr/lib64 -lpython2.7 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available

The standard stream gives:

['nvcc', '-shared', '-g', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC', '-Xlinker', '-rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib64', '-I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include', '-I/usr/include/python2.7', '-o', '/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/usr/local/cuda-5.0/lib', '-L/usr/local/cuda-5.0/lib64', '-L/usr/lib64', '-lpython2.7', '-lcublas', '-lcudart']
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.25972604752 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

theano.rc

$ cat .theanorc 
[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda-5.0

I've just run theanotest as superuser. Now the problem did not appear, but it is still not using the GPU. — Martin Thoma, Jul 29 '14 at 16:03
what is the result of running `nvidia-smi -a` as an ordinary user? also please do `echo $PATH`, `echo $LD_LIBRARY_PATH` as an ordinary user. Please also do `find / -name nvcc` as a root user. What method did you use to install CUDA? Did you use repositories, or did you use the runfile installer method (i.e. download a .run file from nvidia website and execute it)? Did you build the cuda samples? If so, can you also run the deviceQuery sample as an ordinary user. — Robert Crovella, Jul 30 '14 at 00:38
@RobertCrovella: I've added the results of `nvidia-smi -a` as an ordinary user and as root. The `$LD_LIBRARY_PATH` and `$PATH` is also there. `nvcc` is located in `/usr/local/cuda-5.0/bin/nvcc`. I have no idea how CUDA was installed. I did not install it. I've built one CUDA example (see my question). I could compile `1_Utilities/deviceQuery` as a ordinary user, but when I execute it I get `cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected` — Martin Thoma, Jul 30 '14 at 03:29
The permissions on your device files (e.g. `/dev/nvidia0`, `/dev/nvidiactl`) are improperly set. That's a rather unusual error, I'm not sure how it's happening. It may be that the system you are working on has an improper [startup script](http://www.resultsovercoffee.com/2011/01/cuda-in-runlevel-3.html) that is setting these files up with improper permissions. Can you run the following command: `ls -l /dev/nv*` and report the output. — Robert Crovella, Jul 30 '14 at 04:45
@RobertCrovella: I've added the results of `ls -l /dev/nv*` to my question under "Testing machine". What should the permissions be? — Martin Thoma, Jul 30 '14 at 13:46
The permissions are not correct. They should be like [this](http://pastebin.com/PpdYPLZc). It's not clear why the permissions are messed up or what else may be wrong with your machine. I suggest you do a [proper cuda install](http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#abstract) — Robert Crovella, Jul 30 '14 at 14:39

nouiz · Answer 1 · 2014-08-07T13:53:58.373

4

As some comments told, the problem is the permissio of /dev/nvidia*. As some told this mean during your startup, it don't get initialized correctly. Normally, this is done correctly when the GUI is started. My guess is that you didn't enable it or install it. So you probably have an headless server.

To fix this, just run as root nvidia-smi. This will detect that it isn't started correctly and will fix it. root have the permission to fix things. Normal user don't have the permission to fix this. That is why it work with root (it get automatically fixed), but not as normal user.

This fix need to be done each time the computer boot. To automatise this, you can create as root this file /etc/init.d/nvidia-gpu-config with this content:

#!/bin/sh
#
# nvidia-gpu-config    Start the correct initialization of nvidia GPU driver.
#
# chkconfig: - 90 90
# description:  Init gpu to wanted states

# sudo /sbin/chkconfig --add nvidia-smi
#

case $1 in
'start')
nvidia-smi
;;
esac

Then as root run this command: /sbin/chkconfig --add nvidia-gpu-config.

UPDATE: This work for OS that use the init system SysV. If your system use the init system systemd, I don't know if it work.

edited Aug 07 '14 at 13:53

answered Jul 31 '14 at 01:08

nouiz

5,071
25
21

Thank you so much! At least the test script did now run (I'll try it in a few hours!). When I run the last command, I got: http://pastebin.com/TempB1c8 - is that ok? – Martin Thoma Aug 06 '14 at 14:45
Wait a second ... that's strange. When I execute as root the theanotest.py, I get: http://pastebin.com/sEVk8P6C . Why does it still say `used cpu`? It seems to be still broken. (When I execute as a normal user it crashes as usual) – Martin Thoma Aug 06 '14 at 15:44
1

You didn't specify the Theano flag floatX=float32. So it used float64 dtype. The current back-end support only float32. I forgot to tell, the script I gave work on Fedora version that use the systemV init system. Your system seem to use systemd. I never used it. – nouiz Aug 07 '14 at 13:52
Ok, it works for root now :-) But I didn't get it working for my user. Even if root ran `nvidia-smi` before. – Martin Thoma Aug 08 '14 at 01:07

score 0 · Answer 2 · answered Jul 29 '14 at 08:55

0

Try exporting C_INCLUDE_PATH to cuda toolkit include files on your system, something like:

export C_INCLUDE_PATH=${C_INCLUDE_PATH}:/usr/local/cuda/include

answered Jul 29 '14 at 08:55

etaoin

81
1
5

ok, try exporting CXX_INCLUDE_PATH also. You could also check where cuda_runtime.h is by running "locate cuda_runtime.h". – etaoin Jul 29 '14 at 19:39
I know where `cuda_runtime.h` is. It is in `/usr/local/cuda-5.0/include/cuda_runtime.h`. – Martin Thoma Jul 29 '14 at 20:15
I've added it to `CXX_INCLUDE_PATH`. No effect. – Martin Thoma Jul 29 '14 at 20:19
ok, i don't know how good is your unix-foo but, just in case, let's go over the possibilities: **1** environment vars are not set correctly (they last only as long as the terminal session in which you set them if you don't put them into ~/.bashrc or similar). You should try compiling some other cuda code to see if your toolkit is set up properly. **2** file permissions on cuda location are not sufficient (look into chown, chmod, and ls -l) **3** something is wrong in theano library for which you should contact theano-users mailing list, which i see you already did... – etaoin Jul 29 '14 at 20:34
I've already updated my question: 1) I saved the changes to environment vars in `.profile` (and I source it before I try anything new) 2) CUDA works for root, but not for my normal user (but theanosets.py does not work for root). I don't know how to get CUDA work for normal users. – Martin Thoma Jul 29 '14 at 20:43
must be file permissions then... have a look at http://www.linux.com/learn/tutorials/309527-understanding-linux-file-permissions or the long story short, execute: chmod -R 755 /usr/local/cuda-5.0 – etaoin Jul 29 '14 at 20:56