2

I do not understand the behaviour of the following code:

template< bool b >
struct Foo {
  Foo() = default;
  __host__   Foo( const Foo & ) requires(  b ) {}
  __device__ Foo( const Foo & ) requires( !b ) {}  
};

template< typename Lambda >
__global__
void kernel( Lambda ) {}

int main() {
  Foo< true > foo;
  auto la = [foo] __device__ (){ };
  kernel<<< 1, 1 >>>( la );
}

When I compile it with nvcc 12.1 and gcc 11.3 as host compiler (nvcc main.cu -std=c++20 --expt-extended-lambda) I get two errors

  • copy constructor for class "Foo<true>" is ambiguous
        auto la = [foo] __attribute__((device)) (){ };
                   ^
    
  • function "lambda []()->void::<unnamed>(const lambda []()->void &)"
    (declared implicitly) cannot be referenced -- it is a deleted function
        kernel<<< 1, 1 >>>( la );
                            ^
    

I do not understand both errors.

  • Why is the copy ctor ambigious?
  • Which function is deleted at all and at what place wanted the compiler use it?
tommsch
  • 582
  • 4
  • 19
  • 1
    "Which function is deleted" - lambda's copy constructor. It is deleted, because it captures `Foo` by value and `Foo` has ambiguous copy constructor. I don't know why `Foo`'s copy constructor is ambiguous tho, I don't know concepts well enough to answer. – Yksisarvinen Jul 26 '23 at 08:52
  • 1
    `la` is declared to be on the device side, but `Foo`'s copy constructor is on the host side, so it can't copy the lambda maybe? Does this work with `Foo foo`? – Artyer Jul 26 '23 at 08:55
  • @Artyer I added printf statements to see which copy ctor is called. Its always the host copy ctor. – tommsch Jul 26 '23 at 08:58
  • Are you sure GCC 11.3 supports the `requires(b)` correctly? The keyword `requires` is coming from C++20 while the first version of GCC 11 was released in April, 2021 (and minor version certainly do not introduce features like this). Do you use at least CUDA 12 and above (see [this](https://stackoverflow.com/questions/70701532/using-c20-in-the-nvcc-compiler-for-cuda) though some specific features might still cause issues)? – Jérôme Richard Jul 26 '23 at 10:21
  • 1
    @JérômeRichard I am using Cuda 12, and according to `https://gcc.gnu.org/projects/cxx-status.html` gcc 11 should support concepts. – tommsch Jul 26 '23 at 10:40
  • Seeing that a corresponding CPU version [compiles and runs](https://godbolt.org/z/ca4nKPcxs) fine using gcc 11.3 (12 and 13 as well as clang 14 and 16 work fine too), that adding `__host__ __device__` for both overloads of the copy contructor of `Foo` [still gives the same error](https://cuda.godbolt.org/z/3xYd7x19o) and that having everything as device functions [also gives the same error](https://cuda.godbolt.org/z/MGanEev1a), this might actually be a bug in nvcc. – paleonix Jul 26 '23 at 12:39
  • I found out that (cuda documentation) an extended lambda is wrapped by nvcc inside a `std::function`. Can this explain that behaviour? – tommsch Jul 26 '23 at 12:42
  • 1
    For the pure device version you [can get rid](https://cuda.godbolt.org/z/3cjf9Gbqv) of the `-extended-lambda` flag, so I'm not sure if that applies. Even better, put the pure host version into `nvcc` and [get the same error](https://cuda.godbolt.org/z/bvxPjzEMT). I would file a bug (maybe check CUDA 12.2 first). – paleonix Jul 26 '23 at 12:47
  • I found the same issue with CUDA 12.2 and gcc 11.3. – paleonix Jul 26 '23 at 15:35
  • @paleonix Thanks for checking. I will file a bug. – tommsch Jul 26 '23 at 15:38

0 Answers0