CUDA has lots of documentation and guides all over the place, but one I haven't been able to find has been any form of instruction in how to diagnose kernels that compile but get cryptic, vague error messages such as 'unspecified launch failure' beyond the normal "Do these block/grid structures make sense?" etc.
Can I intercept the cubin file somehow and do some static analysis on the memory structures etc? Forgive my noobness but I can't find any definitive, idiots guide, anywhere.
Have a good weekend everyone.
What I'm looking for
- How to separate out the cubin intermediate file
- What to do with it afterwards to work out what's going on, specifically register and memory configuration to see if my code is violating any hardware requirements, or if I'm just missing an off-by-one error somewhere.
For anyone coming across this later (I seem to have a habit of creating SO questions that keep showing up in my own queries months later...) CUDA-Memcheck gives much more interesting responses that the 'check error' handlers. eg
========= Error: process didn't terminate successfully
========= Invalid __global__ write of size 4
========= at 0x00000040 in decomp
========= by thread (1,0,0) in block (0,0,0)
========= Address 0x00101024 is out of bounds
=========
========= ERROR SUMMARY: 1 error
I don't even have to explain that error message...