Create arrays in shared memory w/o templates like in PyOpenCL

Question

How can I create an array in shared memory without modifying the kernel using templates as seen in the official examples. Or is using templates the official way?

In PyOpenCL I can create an array in local memory with setting a kernel argument

kernel.set_arg(1,numpy.uint32(a_width))

... 
KERNEL_CODE = """
__kernel void matrixMul(__local float* A_temp,...)
    { ...} """

Your tag is missleading. You look for help for OpenCL and not for CUDA. — Rick-Rainer Ludwig, Jun 24 '11 at 13:07
I am looking for feature in PyCUDA which exists in PyOpenCL as well as in CUDA. I tried to tag it 'PyCUDA', but the tag does not exist yet. — Framester, Jun 24 '11 at 13:10
I had been meaning to add a tag for PyCUDA for a while. Now done. — talonmies, Jun 27 '11 at 11:06

score 3 · Accepted Answer · answered Jun 27 '11 at 10:52

CUDA supports dynamic shared memory allocation at kernel run time, but the mechanism is a bit different to OpenCL. In the CUDA runtime API, a kernel using dynamically allocated/sized shared memory and the launch to size the memory uses the following syntax:

__global__ void kernel(...)
{
    extern __shared__ typename buffer[];

    ....
}
....
kernel <<< griddim, blockdim, sharedmem, streamID >>> (...)

where sharedmem is the total number of bytes per block which will be allocated to buffer.

In PyCUDA, the same mechanism works something like this:

mod = SourceModule("""
    __global__ void kernel(...)
    {
        extern __shared__ typename buffer[];

        ....
    }
  """)

func = mod.get_function("kernel")
func.prepare(..., shared=sharedmem)
func.prepared_call(griddim,blockdim,...)

with the shared memory allocation size passed to the prepare method.

score 0 · Answer 2 · answered Jun 24 '11 at 13:17

I do not understand the question fully. I do not work with Python, but know OpenCL quite well.

In OpenCL you have two possibilities to create shared/local memory buffers:

1) You add a kernel parameter as you have it in you question. 2) Do define a buffer statically within the kernel itself like:

__local buffer[1024];

There are no other chances to do this with OpenCL. How you create the kernel code string to pass it to OpenCL is another question and related to Python. I am not an expert on this.

Create arrays in shared memory w/o templates like in PyOpenCL

2 Answers2

Linked