Have GTX Titan, having trouble with Dynamic Parallelism

Question

I'm trying to call a CUDA kernel from another kernel, but get the following error :

Traceback (most recent call last):
  File "C:\temp\GPU Program Shell.py", line 22, in <module>
    """)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 262, in __init__
    arch, code, cache_dir, include_dirs)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 252, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 134, in compile_plain
    cmdline, stdout=stdout.decode("utf-8"), stderr=stderr.decode("utf-8"))
pycuda.driver.CompileError: nvcc compilation of         c:\users\karste~1\appdata\local\temp\tmpgq8t45\kernel.cu failed
[command: nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu]
[stderr:
kernel.cu(14): error: kernel launch from __device__ or __global__ functions requires separate         compilation mode

My understanding is that this is has to do with Dynamic Parallelism and the other question related to this error is due to a user without approppriate hardware. I have a GTX Titan, however, so it should be compatible. What am I missing?

EDIT

After adding "options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu']" to SourceModule, I get the following error:

Traceback (most recent call last):
  File "C:\temp\GPU Program Shell.py", line 22, in <module>
""", options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu'])
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 265, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: not found -

score 5 · Accepted Answer · answered Oct 14 '13 at 19:24

5

Python is compiling the CUDA code on the fly:

nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu

In order to compile code containing dynamic parallelism, it's necessary to add specific switches to the compile command to enable separate compilation, device code linking, linking of the device runtime library, and the appropriate architecture target (sm_35).

Some examples of valid nvcc command combinations are given in the programming guide section on dynamic parallelism.

Your command line should look something like:

nvcc --cubin -arch=sm_35 -m64 -rdc=true -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu -lcudadevrt

You may also wish to read the nvcc manual on separate compilation.

answered Oct 14 '13 at 19:24

Robert Crovella

143,785
11
213
257

Thanks for the pointer, Robert. I'm gonna check this out when I get back home. – Hair of Slytherin Oct 14 '13 at 20:03
I made the changes by adding in the -rdc, -IC and -lcudadevrt switches into the options list for Source Module. I get the error edited into the original question! Life is sad. – Hair of Slytherin Oct 15 '13 at 02:06
It's a different problem now. PyCUDA is not finding the cubin. There could be a number of reasons for this, such as code architectural mismatch, a problem with the compiler cache, a compile error of some sort, or perhaps other issues. Are you working in a 32 bit or 64 bit environment? Are you able to run other pycuda codes? I'm not a pycuda expert, so you might get better traction if you post a new question covering this issue. – Robert Crovella Oct 15 '13 at 05:59
Hmmm, OK. I'm working in a 64 bit environment and am able to run other pycuda Codes. I'll take your recommendation and post another question but it seems like no matter what I do, dynamic parallelism hits a roadblock. I tried a similar experiment in C++ last night after googling this new problem for hours and hit a LINK2019 error. God hates me. – Hair of Slytherin Oct 15 '13 at 14:35
And for anyone looking for the answer to the separate compilation issue indicated before the edit, Robert nailed it. – Hair of Slytherin Oct 15 '13 at 14:37
I'm reasonably sure we can get dynamic parallelism to work for you in C++. I'm slightly less sure about PyCUDA, you're apparently not the only one who has run into [something like this](http://lists.tiker.net/pipermail/pycuda/2013-February/004249.html). – Robert Crovella Oct 15 '13 at 15:06
I hope so! And yea, that's the same thing that I've found for PyCUDA with no resolution. – Hair of Slytherin Oct 15 '13 at 18:33

Have GTX Titan, having trouble with Dynamic Parallelism

1 Answers1