0

I tried to debug my CUDA application with cuda-gdb but got some weird error.

I set option -g -G -O0 to build my application. I could run my program without cuda-gdb, but didn't get correct result. Hence I decided to use cuda-gdb, however, I got following error message while running program with cuda-gdb

Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).

What does it means? Why sm=0 and what's the meaning of error=16?

Update 1: I tried to use cuda-gdb to CUDA samples, but it fails with same problem. I just installed CUDA 6.0 Toolkit followed by instruction of NVIDIA. Is it a problem of my system?

Update 2:

  • OS - CentOS 6.5
  • GPU
    • 1 Quadro 400
    • 2 Tesla C2070
    • I'm using only 1 GPU for my program, but I've got same bug message from any GPU that I selected
  • CUDA version - 6.0
  • GPU Driver
    • NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.62 Wed Mar 19 18:20:03 PDT 2014
    • GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

Update 3: I tried to get more information in cuda-gdb, but I got following results

(cuda-gdb) info cuda devices Error: Failed to read the valid warps mask (dev=1, sm=0, error=16). (cuda-gdb) info cuda sms Focus not set on any active CUDA kernel. (cuda-gdb) info cuda lanes Focus not set on any active CUDA kernel. (cuda-gdb) info cuda kernels No CUDA kernels. (cuda-gdb) info cuda contexts No CUDA contexts.

Jongsu Liam Kim
  • 717
  • 1
  • 7
  • 23
  • You might want to file an nvidia bug report. This is [the link to do it](https://developer.nvidia.com/nvbugs/cuda/add), however you'll need to be [logged in as a registered developer](http://developer.nvidia.com) first. The best scenario would be if you can provide a short program that reproduces the problem, and also provide your exact machine configuration (OS, CUDA version, GPU, GPU driver, etc.) plus whatever cuda-gdb commands are needed to demonstrate the problem. – Robert Crovella Jun 02 '14 at 14:55
  • Updated detail. I can run sample application from NVIDIA, but `cuda-gdb` doesn't work neither, which means running program without debugger is fine. – Jongsu Liam Kim Jun 02 '14 at 15:07

2 Answers2

2

This is internal cuda-gdb bug. You should report a bug.

Can you try installing CUDA toolkit from the package on NVIDIA site?

Eugene
  • 9,242
  • 2
  • 30
  • 29
2

Actually, this issue is only specific to some old NVIDIA GPUs(like "Quadro 400", "GeForce GT220", or "GeForce GT 330M", etc).

On Liam Kim's setup, cuda-gdb should work fine by set environment variable "CUDA_VISIBLE_DEVICES", and let cuda-gdb running on Tesla C2070 GPUs specifically. I.e $export CUDA_VISIBLE_DEVICES=0 (or 2) - the exact CUDA devices index could be found by running cuda sample - "deviceQuery".

And now, this issue has been fixed, the fix would be availble for CUDA developers in the next CUDA release(it will be posted out around early July, 2014).

kkang
  • 36
  • 1
  • The CUDA6.5 Release Candidate (RC) is now available to all CUDA Registered Developers. It should contain this fix. Learn more at: https://developer.nvidia.com/cuda-toolkit – kkang Jul 09 '14 at 05:11
  • I'm using CUDA6.5 and still having this problem. I have a GeForce GTX 860M. If I set a breakpoint in my cuda code, after it is triggered a few times I get "Error: Failed to read the valid warps mask (dev=0, sm=3, error=16)." I have set CUDA_VISIBLE_DEVICES=0. – matth Jan 10 '15 at 11:30