1

I'm launching a CUDA kernel I've compiled, using the cudLaunchKernel() driver API function. I'm passing my parameters in a kernelParams array, and passing nullptr for the extra argument.

Unfortunately, this fails, with the error: CUDA_ERROR_INVALID_HANDLE. Why? I checked the Driver API documentation to see how the function might fail in what cases, and edit it discusses the failure with CUDA_ERROR_INVALID_VALUE (not the same thing). It doesn't discuss the error I get.

Since there is more than one parameter to cuLaunchKernel() which is some sort of a handle - what does this failure mean? (And if there are multiple options - what are they?)

einpoklum
  • 118,144
  • 57
  • 340
  • 684

3 Answers3

2

One possibility is a failure due to a CUDA driver context switch. You may have inadvertently performed some action which pushes or replaces the current context for the CUDA device; and loaded modules are part of context - so your compiled and loaded kernel can no longer be loaded in the current context. This triggers a CUDA_ERROR_INVALID_HANDLE failure.

Assuming this is the case, switch the context before the launch, e.g. this way:

cuCtxPushCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);
/* possibly */ cuCtxPopCurrent(NULL);

or like so:

cuCtxSetCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);

Note that you may be risking memory leaks, if you pop and ignore the only reference to a valid context; and you may also risk some other code assuming that the context it has put in place is still the active one.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
-1

Well, in my case it was an OOM error (Out of Memory) error which for some reason was not reported as such. When I reduced the batch size of my model it worked. Maybe you should check if this is the case also.

Eypros
  • 5,370
  • 6
  • 42
  • 75
  • 1
    How sure are you that it was OOM, rather than another potential effect of your changes? Can you post a MWE of this? – einpoklum Dec 20 '21 at 09:16
  • There wasn't any other change from my side (besides the batch size). So, it seems rather reasonable to have been an OOM error. – Eypros Dec 27 '21 at 20:31
-1

cuobjdump -symbols myModule.cubin to check whether your function's name had been changed, if so, then add the extern "C" before your device function