This is an extension of the discussion here: pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"
Is there a method in pycuda that is equivalent to the following C++ API call?
#define SHARED_SIZE 0x18000 // 96 kbyte
cudaFuncSetAttribute(func, cudaFuncAttributeMaxDynamicSharedMemorySize, SHARED_SIZE)
Working on a recent GPU (Nvidia V100), going beyond 48 kbyte shared memory requires this function attribute be set. Without it, one gets the same launch error as in the topic above. The "hard" limit on the device is 96 kbyte shared memory (leaving 32 kbyte for L1 cache).
There's a deprecated method Fuction.set_shared_size(bytes)
that sounds promising, but I can't find what it's supposed to be replaced by.