1

In conventional C++, it's possible to create a multi-dimensional "viewer" or "wrapper" to a 1D buffer in linear memory by (1) defining a custom ArrayWrapper class, (2) overriding the () or [] operator as its "accessor", doing the address calculation inside this member function, and (3) returning a reference to the value. Thus, a 4D array can be accessed via the syntactic sugar array(a, b, c, d) = val. This improves code readability, and also decouples the viewer from the actual memory layout of the array.

#include <iostream>
#include <cstdlib>

template <typename T>
class ArrayWrapper
{
public:
        ArrayWrapper(T *buf) : array(buf) {};

        inline T& operator() (size_t a, size_t b, size_t c, size_t d)
        {
                return array[a + b + c + d];
        }

        const inline T& operator() (size_t a, size_t b, size_t c, size_t d) const
        {
                return array[a + b + c + d];
        }
        T *array;
};

int main(void)
{
        int *buf = (int *) malloc(sizeof(int) * 100);
        ArrayWrapper<int> array(buf);
        array(1, 2, 3, 4) = 42;

        std::cout << array(1, 2, 3, 4) << std::endl;
}

However, this wrapper is not usable in a DPC++ / SYCL 2020 kernel.

int main(void)
{
        sycl::queue Q;
        auto buf = sycl::malloc_shared<int>(20, Q);
        ArrayWrapper<int> array(buf);

        Q.single_task([=]() {
                array(1, 2, 3, 4) = 42;
        });
        Q.wait();

        std::cout << array(1, 2, 3, 4) << std::endl;
}

Compiling this function with Intel DPC++ compiler returns the following error:

question-sycl.cpp:37:21: error: expression is not assignable
                array(1, 2, 3, 4) = 42;
                ~~~~~~~~~~~~~~~~~ ^
1 error generated.
make: *** [Makefile:8: question-sycl.elf] Error 1

This is the result due to the use of C++ lambda function, which "captures" variable outside its scope as const variables by default. In conventional C++, this can be solved by either explicitly asking the lambda function to capture a reference Q.single_task([&array]() {}, or declaring the lambda as a mutable function Q.single_task([=]() mutable {}. However, both usages appear to be unsupported in SYCL and prohibited by the DPC++ compiler.

Is there a way to implement the same syntactic sugar array(a, b, c, d) = val in DPC++ / SYCL 2020? I noticed that memory access in SYCL is provided by two abstractions called buffers and accessors. Unfortunately, they only support 1D, 2D, or 3D arrays, not higher dimensions. What is the best way to define a convenient wrapper for accessing high-dimension arrays?

比尔盖子
  • 2,693
  • 5
  • 37
  • 53

3 Answers3

1

As you say, captured objects in SYCL are not mutable, and for good reason: It is very unclear whether all work items should access a shared object of kernel arguments, or whether each work item should have its own copy -- ultimately this depends strongly on the backend / hardware and what they want to do. So we decided that all SYCL kernel arguments should be immutable.

You have two options (and you have already found one):

  1. Just copy the kernel argument:
 Q.single_task([=]() {
   ArrayWrapper<int> a_kernel = array;
   a_kernel(1, 2, 3, 4) = 42;
 });

  1. You already found this one: If this is not acceptable, think about the constness-model of your wrapper. What the immutability property of SYCL kernel arguments really cares about is whether data within the kernel arguments changes. In your case, your wrapper only provides a view -- the wrapper object itself does not change. As such, it might be acceptable to have the const overload return a non-const reference which would solve your issue and might be more appropriate for your use case. Note that true const views could still be represented by instantiating your wrapper with const T type. You could even implement conversions from ArrayWrapper<T> to ArrayWrapper<const T> if you like. As you say, this is how sycl::accessor objects are implemented.

I want to point out that you might not have to implement your own high-dimensional array wrapper. You should be able to use mdspan which already provides this functionality, and initialize it with a SYCL USM pointer. I have no idea about DPC++, but I know that this work in hipSYCL / Open SYCL.

illuhad
  • 506
  • 2
  • 5
0

Mutable closures are currently not supported by DPC++

The solution is capturing a pointer to array:

    auto f = [array = &array]() {
        (*array)(1, 2, 3, 4) = 42;
    };

sycl::queue Q is not available to me, therefore I have simplified the example by removing the irrelevant Q.

273K
  • 29,503
  • 10
  • 41
  • 64
  • Unfortunately capturing a pointer doesn't seem to work, it passes the compiler check but all memory writes have no effect. I guess it's an undefined behavior caused by a subtle violation of SYCL's assumption on shared memory between host and device. But... taking the address of the reference inside the lambda function works. As in `{ auto ptr = &array; (*ptr)(1, 2, 3, 4) = 42; }` So at least there's a workaround. Thanks for the pointer! – 比尔盖子 Jun 06 '23 at 04:37
  • This will also have a lifetime issue if Q processes the commands asynchronsouly – Pepijn Kramer Jun 06 '23 at 05:56
0

Update: The problem here is that the SYCL kernel expects a const member function of operator() for array, but here, two definitions were provided, one is a non-const member function that returns a reference, another is a const member function that returns a const value. As a result, the const version of the function is matched which returns a const value. This cannot be modified, by definition.

Thus, the solution is to remove the const-return version of operator():

const inline T& operator() (size_t a, size_t b, size_t c, size_t d) const
{
        return array[a + b + c + d];
}

And changing the non-const-return version of the member function to be a const member function, as in:

inline T& operator() (size_t a, size_t b, size_t c, size_t d) const
{
    return array[a + b + c + d];
}

Problem solved.

I noticed that this is also how SYCL accessors themselves are implemented.

When passing a ArrayWrapper into other functions within the lambda function (which is the compute kernel), the argument must be declared const, when they're in fact modifiable. This is confusing but is a expected form of usage in SYCL / DPC++ programming. Thus, these wrappers should not be called Array, but ArrayWrapper or ArrayAccessor to highlight the fact that only the wrapper itself is const - the data is not.


Outdated answer: 273K's answer offered me a hint on the possible workaround of capturing a pointer, as in:

Q.single_task([array = &array]() {
        (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

Unfortunately, it passes the compiler check but all memory writes have no effect and are invisible to the host. I guess it's an undefined behavior caused by a subtle violation of SYCL's assumption on shared memory between host and device.

But, taking the address of the reference inside the lambda function works:

Q.single_task([=]() {
        auto ptr = &array;
        (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

However, this workaround is not reliable. It depends on the fact that DPC++ compiler doesn't realize the missing constness of auto ptr (in fact, the compiler rejects ArrayWrapper<int>* ptr but not auto ptr).

比尔盖子
  • 2,693
  • 5
  • 37
  • 53
  • It's a very strange compiler. Can it accept `std::bind(&decltype(array)::operator(), &array, 1, 2, 3, 4)`? – 273K Jun 06 '23 at 04:49
  • @273K The kernel runs inside the GPU but other C++ objects exist within the CPU. DPC++ does many things behind the scene (including copying) so they can be treated the same. This edge case probably breaks it. I'm going to ask for some clarification on DPC++'s forum. – 比尔盖子 Jun 06 '23 at 04:52
  • @273K Runtime Type Identification is also forbidden within a SYCL compute kernel. – 比尔盖子 Jun 06 '23 at 04:54
  • I mean how it could deduce `ptr` with losing the const qualifier.. – 273K Jun 06 '23 at 04:54
  • You can try `ArrayWrapper::` instead of `decltype(array)::`. BTW decltype is not RTTI. – 273K Jun 06 '23 at 04:55
  • @273K Well, `error: SYCL kernel cannot call through a function pointer`. Outside the kernel, DPC++ is just LLVM/clang++, but within the kernel, only a limited subset of C++ is allowed. I still believe SYCL must have provided some ways to achieve my goal. I will try DPC++ developer forum... – 比尔盖子 Jun 06 '23 at 05:05