Diagnosing CUDA Kernel problems

Question

CUDA has lots of documentation and guides all over the place, but one I haven't been able to find has been any form of instruction in how to diagnose kernels that compile but get cryptic, vague error messages such as 'unspecified launch failure' beyond the normal "Do these block/grid structures make sense?" etc.

Can I intercept the cubin file somehow and do some static analysis on the memory structures etc? Forgive my noobness but I can't find any definitive, idiots guide, anywhere.

Have a good weekend everyone.

What I'm looking for

How to separate out the cubin intermediate file
What to do with it afterwards to work out what's going on, specifically register and memory configuration to see if my code is violating any hardware requirements, or if I'm just missing an off-by-one error somewhere.

For anyone coming across this later (I seem to have a habit of creating SO questions that keep showing up in my own queries months later...) CUDA-Memcheck gives much more interesting responses that the 'check error' handlers. eg

========= Error: process didn't terminate successfully
========= Invalid __global__ write of size 4
=========     at 0x00000040 in decomp
=========     by thread (1,0,0) in block (0,0,0)
=========     Address 0x00101024 is out of bounds
=========
========= ERROR SUMMARY: 1 error

I don't even have to explain that error message...

So I think we answered part 1, the CUBIN file. Part 2 might depend on exactly what you're doing. Could you give a concrete example of the sort of scenario where you get these errors? — Ade Miller, Apr 16 '11 at 17:19

talonmies · Accepted Answer · 2011-04-16T17:14:04.390

In CUDA, "unspecified launch failure" is the equivalent of a segfault.

Recent toolkit versions ship with a utility called cuda-memcheck. It performs valgrind like analysis of memory transactions inside an executing kernel, and will report buffer overflows or any illegal pointer usage in a kernel. You can use that as a launching point for further analysis. If you are using a Fermi card, there is also in-kernel printf support, it isn't hard to generate your own assert function to test and report for error conditions inside a kernel.

CUDA also ships with a source level debugger, but you need a dedicated GPU to use it. If you are on linux and only have a single GPU, quit out of X11 and run it from a console TTY.

I'm calling this one the 'answer' as its what easily pointed me to the source of my problem, but tbh everyone's answers and comments were very helpful. — Bolster, Apr 16 '11 at 18:37

score 2 · Answer 2 · answered Apr 16 '11 at 16:18

2

If you set the Keep Preprocessed Files flag --keep this will leave the CUBIN files and a host of others lying around for you to take a look at. But I'm not sure this will help that much.

answered Apr 16 '11 at 16:18

Ade Miller

13,575
1
42
75

Thank you, but I guess what I'm wondering is what can I do with them (as stated, I'm not exactly a guru at this...) – Bolster Apr 16 '11 at 16:30
@Andrew Bolster: you can disassemble CUBIN files to look at the final machine code run on the card, but that isn't what you want in this case. – talonmies Apr 16 '11 at 16:39

score 2 · Answer 3 · answered Apr 16 '11 at 16:26

2

Are you using cudaGetLastError()? That could help give more information if it's not already used to give 'unspecified launch failure'.

answered Apr 16 '11 at 16:26

Adam S.

1,251
1
12
16

Thank you for answering; but yes, thats where I'm getting my current info from :) – Bolster Apr 16 '11 at 16:29

Diagnosing CUDA Kernel problems

3 Answers3