pycuda shared memory up to device hard limit

Question

This is an extension of the discussion here: pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

Is there a method in pycuda that is equivalent to the following C++ API call?

#define SHARED_SIZE 0x18000 // 96 kbyte
cudaFuncSetAttribute(func, cudaFuncAttributeMaxDynamicSharedMemorySize, SHARED_SIZE)

Working on a recent GPU (Nvidia V100), going beyond 48 kbyte shared memory requires this function attribute be set. Without it, one gets the same launch error as in the topic above. The "hard" limit on the device is 96 kbyte shared memory (leaving 32 kbyte for L1 cache).

There's a deprecated method Fuction.set_shared_size(bytes) that sounds promising, but I can't find what it's supposed to be replaced by.

score 1 · Answer 1 · answered Aug 08 '19 at 10:03

1

PyCUDA uses the driver API, and the corresponding function call for setting a function dynamic memory limits is cuFuncSetAttribute.

I can't find that anywhere in the current PyCUDA tree, and therefore suspect that it has not been implemented.

answered Aug 08 '19 at 10:03

talonmies

70,661
34
192
269

score 1 · Answer 2 · answered Jul 28 '20 at 19:26

I'm not sure if this is what you're looking for, but this might help someone looking in this direction.

The dynamic shared memory size in PyCUDA can be set either using:

shared argument in the direct kernel call (the "unprepared call"). For example:

myFunc(arg1, arg2, shared=numBytes, block=(1,1,1), grid=(1,1))

shared_size argument in the prepared kernel call. For example:

myFunc.prepared_call(grid, block, arg1, arg2, shared_size=numBytes)

where numBytes is the amount of memory in bytes you wish to allocate at runtime.

pycuda shared memory up to device hard limit

2 Answers2