cuda-gdb Error message

Question

I tried to debug my CUDA application with cuda-gdb but got some weird error.

I set option -g -G -O0 to build my application. I could run my program without cuda-gdb, but didn't get correct result. Hence I decided to use cuda-gdb, however, I got following error message while running program with cuda-gdb

Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).

What does it means? Why sm=0 and what's the meaning of error=16?

Update 1: I tried to use cuda-gdb to CUDA samples, but it fails with same problem. I just installed CUDA 6.0 Toolkit followed by instruction of NVIDIA. Is it a problem of my system?

Update 2:

OS - CentOS 6.5
GPU
- 1 Quadro 400
- 2 Tesla C2070
- I'm using only 1 GPU for my program, but I've got same bug message from any GPU that I selected
CUDA version - 6.0
GPU Driver
- NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.62 Wed Mar 19 18:20:03 PDT 2014
- GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

Update 3: I tried to get more information in cuda-gdb, but I got following results

(cuda-gdb) info cuda devices Error: Failed to read the valid warps mask (dev=1, sm=0, error=16). (cuda-gdb) info cuda sms Focus not set on any active CUDA kernel. (cuda-gdb) info cuda lanes Focus not set on any active CUDA kernel. (cuda-gdb) info cuda kernels No CUDA kernels. (cuda-gdb) info cuda contexts No CUDA contexts.

You might want to file an nvidia bug report. This is [the link to do it](https://developer.nvidia.com/nvbugs/cuda/add), however you'll need to be [logged in as a registered developer](http://developer.nvidia.com) first. The best scenario would be if you can provide a short program that reproduces the problem, and also provide your exact machine configuration (OS, CUDA version, GPU, GPU driver, etc.) plus whatever cuda-gdb commands are needed to demonstrate the problem. — Robert Crovella, Jun 02 '14 at 14:55
Updated detail. I can run sample application from NVIDIA, but `cuda-gdb` doesn't work neither, which means running program without debugger is fine. — Jongsu Liam Kim, Jun 02 '14 at 15:07

score 2 · Answer 1 · answered Jun 02 '14 at 16:39

2

This is internal cuda-gdb bug. You should report a bug.

Can you try installing CUDA toolkit from the package on NVIDIA site?

answered Jun 02 '14 at 16:39

Eugene

9,242
2
30
29

Yes, I installed from a repo provided by NVIDIA site. – Jongsu Liam Kim Jun 02 '14 at 19:30
I confirmed it was a bug from NVIDIA. They said they are working on fix now. – Jongsu Liam Kim Jun 06 '14 at 08:16

score 2 · Accepted Answer · answered Jun 20 '14 at 06:09

2

Actually, this issue is only specific to some old NVIDIA GPUs(like "Quadro 400", "GeForce GT220", or "GeForce GT 330M", etc).

On Liam Kim's setup, cuda-gdb should work fine by set environment variable "CUDA_VISIBLE_DEVICES", and let cuda-gdb running on Tesla C2070 GPUs specifically. I.e $export CUDA_VISIBLE_DEVICES=0 (or 2) - the exact CUDA devices index could be found by running cuda sample - "deviceQuery".

And now, this issue has been fixed, the fix would be availble for CUDA developers in the next CUDA release(it will be posted out around early July, 2014).

answered Jun 20 '14 at 06:09

kkang

36
1

The CUDA6.5 Release Candidate (RC) is now available to all CUDA Registered Developers. It should contain this fix. Learn more at: https://developer.nvidia.com/cuda-toolkit – kkang Jul 09 '14 at 05:11
I'm using CUDA6.5 and still having this problem. I have a GeForce GTX 860M. If I set a breakpoint in my cuda code, after it is triggered a few times I get "Error: Failed to read the valid warps mask (dev=0, sm=3, error=16)." I have set CUDA_VISIBLE_DEVICES=0. – matth Jan 10 '15 at 11:30

cuda-gdb Error message

2 Answers2

Linked

Related