2

I am doing GPGPU development on Arch Linux with the cuda-sdk and cuda-toolkit packages. My attempts to run cuda-gdb as a normal user on a simple program results in:

$ cuda-gdb ./driver
NVIDIA (R) CUDA Debugger
4.2 release
Portions Copyright (C) 2007-2012 NVIDIA Corporation
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/nwh/Dropbox/projects/G4CU/driver...done.
(cuda-gdb) run
Starting program: /home/nwh/Dropbox/projects/G4CU/driver 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
fatal:  The CUDA driver initialization failed. (error code = 1)

If I run cuda-gdb as root, it behaves normally:

# cuda-gdb ./driver
NVIDIA (R) CUDA Debugger
4.2 release
Portions Copyright (C) 2007-2012 NVIDIA Corporation
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/nwh/Dropbox/work/2012-09-06-cuda_gdb/driver...done.
(cuda-gdb) run
Starting program: /home/nwh/Dropbox/work/2012-09-06-cuda_gdb/driver 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff5ba8700 (LWP 11386)]
[Context Create of context 0x6e8a30 on Device 0]
[Launch of CUDA Kernel 0 (thrust::detail::backend::cuda::detail::launch_closure_by_value<thrust::detail::backend::cuda::for_each_n_closure<thrust::device_ptr<unsigned long long>, unsigned int, thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned long long> > > ><<<(1,1,1),(704,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 1 (set_vector<<<(1,1,1),(10,1,1)>>>) on Device 0]
vd[0] = 0
vd[1] = 1
vd[2] = 2
vd[3] = 3
vd[4] = 4
vd[5] = 5
vd[6] = 6
vd[7] = 7
vd[8] = 8
vd[9] = 9
[Thread 0x7ffff5ba8700 (LWP 11386) exited]

Program exited normally.
[Termination of CUDA Kernel 1 (set_vector<<<(1,1,1),(10,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (thrust::detail::backend::cuda::detail::launch_closure_by_value<thrust::detail::backend::cuda::for_each_n_closure<thrust::device_ptr<unsigned long long>, unsigned int, thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned long long> > > ><<<(1,1,1),(704,1,1)>>>) on Device 0]

The test program driver.cu is:

// needed for nvcc with gcc 4.7 and iostream
#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

__global__
void set_vector(int *a)
{
  // get thread id
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  a[id] = id;
  __syncthreads();
}

int main(void)
{
  // settings
  int len = 10; int trd = 10;

  // allocate vectors
  thrust::device_vector<int> vd(len);

  // get the raw pointer
  int *a = thrust::raw_pointer_cast(vd.data());

  // call the kernel
  set_vector<<<1,trd>>>(a);

  // print vector
  for (int i=0; i<len; i++)
    std::cout << "vd[" << i << "] = " << vd[i] << std::endl;

  return 0;
}

driver.c is compiled with the command:

$ nvcc -g -G -gencode arch=compute_20,code=sm_20 driver.cu -o driver

How can I get cuda-gdb to run with out root permissions?

Some more information: the output from nvidia-smi is:

$ nvidia-smi
Mon Sep 10 07:16:32 2012       
+------------------------------------------------------+                       
| NVIDIA-SMI 4.304.43   Driver Version: 304.43         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro FX 1700           | 0000:01:00.0     N/A |                  N/A |
| 60%   52C  N/A     N/A /  N/A |   4%   20MB /  511MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla C2070              | 0000:02:00.0     Off |                    0 |
| 30%   82C    P8    N/A /  N/A |   0%   11MB / 5375MB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

The display is connected to the Quadro and I run CUDA applications on the Tesla.

nwhsvc
  • 652
  • 1
  • 6
  • 15
  • Can you provide some more information : 1. Can you run CUDA applications without the debugger as a normal user ? 2. If you first run a CUDA application as root, do subsequent CUDA applications start as a normal user ? 3. What is the version of cuda-gdb and the CUDA driver you are using ? – Vyas Sep 08 '12 at 00:07
  • @Vyas, (1) yes, I can run CUDA applications as a normal user. (2) yes, if I first run an app as root (with `su` or `sudo`) I can later run the app as a normal user. (3) `cuda-gdb` says that it is the 4.2 release with GNU gdb 7.2. I am using nvidia driver version 304.43. Thank you for considering this problem! – nwhsvc Sep 10 '12 at 13:48

2 Answers2

3

Thank you. From the sounds of it, your problem is that the device nodes required are not getting initialized. Usually, running X will create the device nodes that are required for the CUDA software stack to communicate with the hardware. When X is not running, as the case is here, running as root creates the nodes. A normal user cannot create the nodes due to a lack of permissions. The recommended approach when running a Linux system without X is to run the following script as root (from the getting started guide at http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_Getting_Started_Linux.pdf)

#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi

Note, you will need to recreate the device nodes on each boot, so it would be best to add this script (or a similar one) to your startup sequence.

@Till : Apologies about the questions as an answer :). I am new to SO and do not have enough reputation to create comments.

Vyas
  • 499
  • 2
  • 4
  • Hi @Vyas, I experience the same problem when X is and is not running. When X is running, I see nodes `nvidia0`, `nvidia1`, and `nvidiactl` in `/dev`. They have a permission level `crw-rw-rw-`. The nodes are also present if I shutdown X. – nwhsvc Sep 11 '12 at 17:33
  • Oops, I missed your first answer - that you were able to run apps as a normal user. This rules out the dev nodes from being the issue. In that case can you try/check the following : 1. Remove the cuda-gdb temporary directory. `rm -rf /tmp/cuda-dbg` 2. Ensure that yama's ptrace restrictions are not present and enabled. – Vyas Sep 12 '12 at 01:35
  • 1. there is no `/tmp/cuda-gdb` on my system. 2. there is also no directory named `yama` in `/proc/sys/kernel/`. Does that mean yama is not active? – nwhsvc Sep 12 '12 at 03:25
  • The folder is `/tmp/cuda-dbg/`(note the spelling : its not a typo). If the yama directory is not in procfs, you likely do not have yama active. – Vyas Sep 12 '12 at 04:38
  • Sorry! `/tmp/cuda-dbg` does not exist. When I start `cuda-gdb`, the directory is created and owned by my normal user. – nwhsvc Sep 12 '12 at 17:51
  • @nwhsvc Sorry if this was already verified earlier, but *just before* you try debugging with cuda-gdb, can you check that the device nodes (/dev/nvidia*) are present? – Mayank Sep 20 '12 at 16:13
  • Could you directly email cudatools@nvidia.com with a description of your problem ? Could you also include the output of `cat /proc/driver/nvidia/version` ? Can you also issue ` set cuda debug general 1` before issuing `run` in cuda-gdb as a normal user, and include the trace ? – Vyas Sep 20 '12 at 17:13
  • Yes! I will do that today. I've already filed a bug with the Arch Linux package. The maintainer was able to reproduce the problem. – nwhsvc Sep 24 '12 at 19:20
1

This problem has been fixed with the latest Nvidia driver (304.60) and latest version of cuda (5.0.35). cuda-gdb does not require root permissions to run.

nwhsvc
  • 652
  • 1
  • 6
  • 15