How do I retrieve the parameter list information for a CUDA 4.0+ kernel?

Question

According to the NVidia documentation for the cuLaunchKernel function, kernels compiled with CUDA 3.2+ contain information regarding their parameter list. Is there a way to retrieve this information programmatically from a CUfunction handle? I need to know the number of arguments and the size of each argument in bytes of a kernel from its CUfunction handle. I have seen the above-referenced NVidia documentation saying that this information exists, but I haven't seen anywhere in the CUDA documentation indicating a programmatic way to access this information.

To add a little more explanation: I'm working with a middleware system. Its frontside library replaces libcuda (the driver API library) on the target system. The backside then runs as a daemon on another host that has the GPGPU resource being used and calls into the real libcuda on that machine. There are other middleware solutions that already do this with cuLaunchKernel, so it's definitely possible. Also, CUDA itself uses this information in order to know how to parse the parameters from the pointer that you pass into cuLaunchKernel.

Edit: I originally had the CUDA version where this metadata was introduced listed incorrectly. It was 3.2, not 4.0, according to the cuLaunchKernel documentation.

CUfunction is the kernel, prefixed with __global__. Do you need the size of CUfunction's arguments? If you have the kernel you can find it. — lashgar, Dec 10 '12 at 05:49
Yes, I need the size of its arguments. I do not have the source to the kernel, only a handle to it (presumably returned from a previous call to cuModuleGetFunction().) Specifically, I need the number of arguments and the size of each argument. — reirab, Dec 10 '12 at 06:04

score 1 · Answer 1 · answered Dec 10 '12 at 12:45

1

cuLaunchKernel is designed to launch kernels for which you know the function prototype. There is no API for "reverse engineering" the function prototype.

answered Dec 10 '12 at 12:45

harrism

26,505
2
57
88

2

it did occur to me that *if* the kernel was compiled with C++ linkage, it should be possible to reverse engineer the mangled symbol name from the symbol of the device ELF payload in a fatbinary or cubin. But if the kernel is compiled with C linkage that won't work...... – talonmies Dec 10 '12 at 15:20
haha, yeah, unfortunately I can't assume C++ linkage was used. According to the documentation for cuLaunchKernel, the number and type of arguments is stored as metadata with any kernel compiled with CUDA 4.0+, but I haven't seen any public API for accessing this information. That metadata is how cuLaunchKernel itself parses the arguments, though, and I'm aware of other middleware that does support cuLaunchKernel, so there has to be some way to get to it. I was hoping for something better than parsing the cubin myself, but that may be what it comes down to. – reirab Dec 11 '12 at 02:00
I believe the metadata are used by the driver API to simplify the API for launching kernels -- the old way required an API call for every kernel argument. The intent is not to make the metadata available, but to make the API easier to use. Why do you say that the existence of other middleware implies that this metadata must be available? – harrism Dec 11 '12 at 05:04
Yes, I'm aware of why the change was made. The reason I say this data must be available in some manner (not necessarily through a public API) is that it is impossible for middleware to intercept the cuLaunchKernel() function and correctly pass its parameters to another machine (as I'm aware of middleware that does) without knowing the number of size of those parameters. This is what I'm doing also. I'm working with a library that replaces libcuda on the target machine and passes the parameter data to another machine where it is passed into the real libcuda functions. – reirab Dec 11 '12 at 14:51
Sorry, that should say "number and size," not "number of size." – reirab Dec 11 '12 at 15:15
Can you please provide an example of middleware that does what you describe? – harrism Dec 12 '12 at 00:34
Hmm... I was thinking rCUDA did this, but upon further review, it looks like it might only support the Runtime API. So, there might not be middleware that does this already. CUDA itself certainly uses this information to extract the parameters to launch the kernel (according to the documentation I linked in the original post,) but I'm not sure if any other middleware does this now or not. – reirab Dec 12 '12 at 19:24
The CUDA runtime has the function prototype info generated by the compiler, no? – harrism Dec 13 '12 at 04:55
1

Yes, it does. According to the documentation for cuLaunchKernel, the compiler stores that function prototype information in the binary, which the runtime then uses to figure out the number of parameters and size of each parameter. I'm trying to find a way to read that same information. – reirab Dec 13 '12 at 08:42
@reirab I know this is a long shot asking 11 years later :) Did you ever find out where in the kernel fatbin/elf/binary this information is stored. I am also trying to find it... – Simon Kirsten Aug 12 '23 at 13:13
1

@SimonKirsten Sorry, at this point I honestly don't even remember if I found it. I haven't worked on any of that stuff since I finished grad school in 2015. – reirab Aug 12 '23 at 18:50

score 1 · Answer 2 · answered Dec 07 '16 at 08:46

I'm working on the same issue (I don't know if in between you solved it). I'm using a known kernel to investigate how che CUfunction pointed memory is used. This is the no parameters version:

#include<cstdio>

extern "C" {
    __global__ void HelloWorld(){
        int thid = (blockIdx.x * blockDim.x) + threadIdx.x;
    }
}

This is the one parameter version and so on.

#include<cstdio>

extern "C" {
    __global__ void HelloWorld(int a) {
        int thid = (blockIdx.x * blockDim.x) + threadIdx.x;
    }
}

I suggest you to dump the first 1024 bytes of the memory pointed by CUfunction and follow the pointers. For example at the 0x30 offset there is a pointer pointing to a table of pointers. I noticed that the size of the struct posted by CUfunction doesn't change with the number of the function parameters, so the table we are looking have to be hunted following the pointers.

How do I retrieve the parameter list information for a CUDA 4.0+ kernel?

2 Answers2