How to declare function with argument that is a closure in C++11 in a Cuda device function?

Question

I'm working on Cuda with C++11 (I don't think Cuda supports later C++ versions yet). I've a closure object that is passed to the function Process() which calls the closure for each iteration.

I understand that std:: functionality is generally not available in Cuda. For example, when I try to use std::function< float(uint32_t) >, I get this error:

error: calling a host function("std::function ::function< ::, void, void> ") from a global function("_NV_ANON_NAMESPACE::LargeKernel") is not allowed

What can I replace lookupFunc with so that this compiles without std::function being available? I was able to work around this by creating a function template to deduce the type of the lambda function.

This code works and shows the work around I've employed:

//using lookupFunc = std::function< float(uint32_t) >;

template< typename Lambda > // Work around with function template
__device__
void Process(float       * const outData,
             const  int32_t      locationX,
             const Lambda /* lookupFunc */ lambda)
{
    float answer = 0.f;

    for( int32_t offset = -1 ; ++offset < 1024 ; )
    {
        const float value = lambda( offset );

        answer += value;
    }

    outData[ locationX ] = answer;
}

__global__
void LargeKernel(const float * const inData,
                 float       * const outData)
{
    constexpr uint32_t cellStride = 1;
    const     int32_t  locationX  = threadIdx.x + blockDim.x * blockIdx.x;
    const auto lambda
        = [locationX, inData, cellStride](const int32_t offset)
          {
              return inData[ locationX + offset + cellStride ];
          };

    Process( outData, locationX, lambda );
}

I also tried:

using lookupFunc = float(* const)(uint32_t);

But that gives error:

error: no suitable conversion function from "const lambda ->float" to "float (*)(uint32_t)" exists

How can I declare the type of the third argument to Process() without using a template?

The [Cuda Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#c-cplusplus-language-support) tells us about C++14 support from version 9.0 on. And there is `nvstd::function`, see e.g. [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#polymorphic-function-wrappers). — BlameTheBits, Mar 29 '19 at 22:18
@Shadow That resolves the question, can you post it as an answer? — WilliamKF, Mar 29 '19 at 23:26

score 3 · Accepted Answer · answered Mar 30 '19 at 00:10

The CUDA equivalent to std::function is nvstd::function. As of CUDA 8.0 nvstd::function can be used in host and device code - but "instances cannot be passed from host code to device code (and vice versa) at run time". Its use is explained in the CUDA programming guide.

score 0 · Answer 2 · answered Mar 29 '19 at 21:58

Is necessary to use a lambda?

Otherwise, you can simulate it declaring a struct

struct noLambda
 {
   std::int32_t const   locationX;
   float const * const  inData;
   std::uint32_t const  cellStride;

   noLambda (std::int32_t l0, float const * const i0, std::uint32_t c0)
      : locationX{l0}, inData{i0}, cellStride{c0}
    { }

   float operator() (std::int32_t const offset) const
    { return inData[ locationX + offset + cellStride ]; }
 };

So the signature of process become

void Process(float       * const outData,
             const  int32_t      locationX,
             const noLambda lambda)

and is callable as follows

Process( outData, locationX, noLambda{locationX, inData, cellStride} );

(caution: code not tested)

Yes, I need to use lambda, the above example is a simplified one to demonstrate the issue. — WilliamKF, Mar 29 '19 at 22:06

How to declare function with argument that is a closure in C++11 in a Cuda device function?

2 Answers2