6

I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well.

I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device.

Thanks for any help. I posted a thread on the NVidia boards:

http://forums.nvidia.com/index.php?showtopic=206261&st=0

But got no responses. If the answer is "you can't ask the system for that information" that would be nice to know too (sort of... ;).

Edit:

The most register usage I've seen has been 63. Edited the above to reflect that.

Eli Stevens
  • 1,447
  • 1
  • 12
  • 21

2 Answers2

6

I think PyCUDA uses the CUDA driver API, so the following may be what is wrong: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES can happen if you do not specify enough arguments, or you specify the wrong size for arguments, when using cuLaunch() to launch kernels. Since you are using PyCUDA, it could be pretty easy to mismatch the argument list required for a kernel and the arguments you are actually passing, so you might want to check how you are calling your kernels.

I think that this is a poorly named error code in this situation...

harrism
  • 26,505
  • 2
  • 57
  • 88
  • This was the error, so thank you for suggesting that. I still wish there were a way to ask what's going on, but it sounds like that's not available. – Eli Stevens Aug 01 '11 at 06:36
  • I was holding off since my real question was "can I get the system to tell me this directly" but based on some conversations elsewhere, I'm pretty sure the answer is "No, cuda doesn't have an API for that." – Eli Stevens Aug 02 '11 at 02:01
  • Yes, the problem is there are often many ways to cause the same error -- being able to tell you exactly what went wrong is a very tricky thing to support. – harrism Aug 02 '11 at 02:50
0

See this answer

CUDA maximum registers per thread: sm_12 vs sm_20

It seems 70 registers is too many registers.

Community
  • 1
  • 1
jmsu
  • 2,033
  • 21
  • 17
  • Sorry, but that is not the underlying problem, nor is it what I am looking for. Even if I make changes to the code to get the register count down to: "Used 36 registers, 492+0 bytes smem, 152 bytes cmem[0], 8 bytes cmem[14], 20 bytes cmem[16]" it still fails. However, the point is that I'm looking for an API to tell me "not enough registers" rather than having to deduce that by hand. – Eli Stevens Aug 01 '11 at 00:02