casting accessors to C++ pointers in kernel code (esp. (int (*)[Nelem])

Question

Environment: Ubuntu 18.04, OneAPI beta 6

Full code is below, but here's the offending error:

#dpcpp -O2 -g -o so2 so2.cpp -lOpenCL -lsycl

so2.cpp:64:38: error: cannot cast from type 'global_ptr<int>' (aka 'multi_ptr<int,  access::address_space::global_space>') to pointer type 'int (*)[nelem]'
                int (*xptr)[nelem] = (int (*)[nelem])xaccessor.get_pointer();                                     
                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

A bit of explanation in case you're wondering why....

When developing data parallel code I often develop what Intel used to call "elemental functions". These are written to operate on a single element of the application (what SYCL would call a work item). I've always found this an easier do with a basic SW development environment, easy to test, and more generally reuseable (scalar, SIMD, CUDA, etc...).

After getting things tested on a single element, moving to data parallel is pretty easy by expanding the calling code without having to rewrite/retest the functions:

    int x[NELEM]
    fn1(x, NELEM)

becomes

    int x[NPROC][NELEM]
    for (int p=0; p<NPROC; p++) fn1(x[p], NELEM);

In a SYCL kernel, fn1(x[item.get_linear_id()], NELEM); would be all I need without having to rewite the function(s) to understand ids and/or accessors.

The SYCL issue with the above code is that in the kernal C++ I can not seem recast the accessor pointer to a 2D pointer. This is allowable in the application C++ (see code above).

Maybe this is a bad way to right code, but it makes it easy to develop/test code that works for scalar and data parallel codeand keeps libraries somewhat portable. It's also provides a way around the SYCL 3 dimension limit on buffers/accessors.

Anyway, I'm curious as to what a real SYCL programmer would think.

Full code for toy example:

#include <CL/sycl.hpp>
#include <cstdio>

namespace sycl = cl::sycl;

const int Nproc=3;
const int Nelem=4;

/** elemental function **/
void
fn1(int *h, int n)
{
  for (int i=0; i<n; i++) h[i] = 10*h[i]+2*i;
}

int 
main(int argc, char *argv[])
{

  /** Make some memory **/
  int x1d[Nproc * Nelem];
  for (int j=0; j<Nproc; j++) {
    for (int i=0; i<Nelem; i++) x1d[j*Nelem+i] = 10*j+i;
  }
  printf("1D\n");
  for (int i=0; i<Nelem; i++) {
    printf("%d : ", i);
    for (int j=0; j<Nproc; j++) printf("%d ", x1d[j*Nelem+i]);
    printf("\n");
  }

  /** Reshape it into 2D **/
  int (*x2d)[Nelem] = (int (*)[Nelem])x1d;
  for (int j=0; j<Nproc; j++) fn1(x2d[j], Nelem);
  printf("2D\n");
  for (int i=0; i<Nelem; i++) {
    printf("%d : ", i);
    for (int j=0; j<Nproc; j++) printf("%d ", x2d[j][i]);
    printf("\n");
  }

  /** SYCL setup **/
  sycl::device dev = sycl::default_selector().select_device();
  std::cout << "Device: " 
      << "name: " << dev.get_info<sycl::info::device::name>() << std::endl
      << "vendor: " << dev.get_info<sycl::info::device::vendor>() << std::endl;
  sycl::queue q(dev);

  {
    sycl::buffer<int, 1> xbuffer(x1d, sycl::range<1> {Nproc*Nelem});

    q.submit([&](sycl::handler& cgh) {
        int nelem = Nelem;
        auto xaccessor = xbuffer.get_access<sycl::access::mode::read_write, sycl::access::target::global_buffer>(cgh);
        cgh.parallel_for<class k0>(
            sycl::range<1> {Nproc}, 
            [=] (sycl::item<1> item) {
                int idx = item.get_linear_id();
#if 0
                int *xptr = (int *)xaccessor.get_pointer();    // doing this does work so we _can_ get a real pointer
                fn1(xptr + nelem*idx, nelem);
#else
                int (*xptr)[nelem] = (int (*)[nelem])xaccessor.get_pointer();
                //int *ptr = (int *)xaccessor.get_pointer();   // splitting it into two doesn't work either
                //int (*xptr)[nelem] = (int (*)[nelem])ptr;
                fn1(xptr[idx], nelem);
#endif
                }
            );
        }
        ); 
  }
  printf("2D SYCL\n");
  for (int i=0; i<Nelem; i++) {
    printf("%d : ", i);
    for (int j=0; j<Nproc; j++) printf("%d ", x1d[j*Nelem+i]);
    printf("\n");
  }
}

Edit 1:

Per illuhad's comment I tried to flesh out some alternatives.

First the two commented lines seem like they should do what he suggests:

    int *ptr = (int *)xaccessor.get_pointer();
    int (*xptr)[nelem] = (int (*)[nelem])ptr;

but in fact it yields this error:

    error: cannot initialize a variable of type 'int (*)[nelem]' with an rvalue of type 'int (*)[nelem]'
                int (*xptr)[nelem] = (int (*)[nelem])ptr;
                      ^              ~~~~~~~~~~~~~~~~~~~

adding a "get()" to the end of the get_pointer yields the same.

Curiously, addressing the "initialize" part of the error:

    int *ptr = (int *)xaccessor.get_pointer().get();
    int (*xptr)[nelem];
    xptr = (int (*)[nelem])ptr;

Yields the amusing error:

    error: incompatible pointer types assigning to 'int (*)[nelem]' from 'int (*)[nelem]'
                xptr = (int (*)[nelem])ptr;
                       ^~~~~~~~~~~~~~~~~~~

So if/when someone gets the time, I'm still curious...

Don't have time for an in-depth reply at the moment, so haven't verified, but I suspect the issue is that `get_pointer()` returns a `global_ptr`, not a pointer. `global_ptr` defines a conversion operator to `T*`, such that it can be implicitly converted to `T*`. This is why the 1D case works. For 2D, you would first need to explicitly convert the `global_ptr` to `T*` using the `get()` member function. You can then cast this pointer to whatever you like. — illuhad, May 17 '20 at 18:25

illuhad · Accepted Answer · 2020-05-19T15:21:04.190

Short answer: Not a SYCL issue ;)

Based on your edit 1, it is clear that if the lines

int *ptr = (int *)xaccessor.get_pointer();
int (*xptr)[nelem] = (int (*)[nelem])ptr;

cause a conversion error in the second line, it cannot really be a DPC++/SYCL issue as there are only variations of int pointers involved, and nothing related to SYCL is going on here.

In fact, the issue is that nelem is not a compile-time constant. So, the following non-SYCL test program

int main(){
  int nelem = 10;
  int* ptr = nullptr;
  int (*xptr)[nelem] = (int (*)[nelem])ptr;
}

reproduces your issue when compiling with regular clang or gcc with -pedantic. By default however, gcc supports variable length arrays as an extension in C++, so the code happens to compile even if it is not valid C++.

Your issue is solved by turning nelem into a compile-time constant, as required by C++. Variable length arrays are part of newer versions of C, but not part of C++.

Great! originally I had use Nelem vs. nelem but was getting some other weird error (I can't reproduce it now but it was something about Nelem not having automatic storage (?)). Anyway, using Nelem and .get() seems to get me over the hump. Thanks again. — justapony, May 20 '20 at 01:40

casting accessors to C++ pointers in kernel code (esp. (int (*)[Nelem])

1 Answers1