2

I try to write a kind of "map" class that wraps OneAPI calls hiding hardware targeting issues through some parameter specifying the kind of target (CPU or GPU/Accelerator). The map, directs code to SYCL kernel or to TBB to implement the map operation through a parallel for. It, takes as parameters device type, CPU or GPU, and the function and applies to all the items in the collection. But in the kernel function, I have an error which is implicit capture is not allowed. I cannot understand what is my mistake. this is my code:

    #include <CL/sycl.hpp>
    #include <iostream>
    #include <tbb/tbb.h>
    #include <tbb/parallel_for.h>
    #include <vector>
    #include <string>
    #include <queue>
    #include<tbb/blocked_range.h>
    #include <tbb/global_control.h>
    
    using namespace std;
    using namespace cl::sycl;
    using namespace tbb;
    
    template<typename Tin, typename Tout>
    class Map {
    private:
        function<Tout(Tin)> fun;
        string device_type;
    public:
        Map() {}
        Map(function<Tout(Tin)> f):fun(f) {}
        void f(function<Tout(Tin)> ff) {
            fun = ff;
           }
        void set_device(string dev) {
                device_type = dev;
            }
    
    
        vector<Tout> operator()(vector<Tin>& v) {
            device *my_dev = new device();
            if(device_type == "cpu"){
                if(my_dev->is_cpu()) {
                    vector<Tout> r(0);
                    tbb::parallel_for(tbb::blocked_range<Tin>(0, v.size()),
                            [&](tbb::blocked_range<Tin> t) {
                        for (int index = t.begin(); index < t.end(); ++index){
                            r[index] = fun(v[index]);
                        }
                });
               return r;
             }
            }else if(device_type == "gpu"){
                if(my_dev->is_gpu()) {
                    vector<Tout> r(v.size());
                    sycl::queue gpuQueue{gpu_selector()};
                    sycl::range<1> n_item{v.size()};
                    sycl::buffer<Tin, 1> in_buffer(&v[0], n_item);
                    sycl::buffer<Tout, 1> out_buffer(&r[0], n_item);
                    gpuQueue.submit([&](sycl::handler& h){
                         //local copy of fun                        
                        //auto f = fun;
                        sycl::accessor in_accessor(in_buffer, h, sycl::read_only);
                        sycl::accessor out_accessor(out_buffer, h, sycl::write_only);
                        h.parallel_for(n_item, [=](sycl::id<1> index) {
                            out_accessor[index] = fun(in_accessor[index]);
                        });
                    }).wait();
                    return r;
                }
    
            }
        }
    
    };
    
    int main(int argc, char *argv[]) {
    
    
        vector<int> v = {1,2,3,4,5,6,7,8};
    
        auto f = [](int x){return (++x);};
    
        sycl::device dev = sycl::cpu_selector().select_device();
        string dev_type = argv[1];
        Map <int,int> m(f);
        m.set_device(dev_type);
        auto r = m(v);
        for(auto &e:r) {
            cout << e << "\n";
        }
    
      return 0;
    }

When I check the Problems in the console of Eclipse, It shows me this error:

1- implicit capture of 'this' is not allowed for kernel functions

Sahar Sa
  • 25
  • 9

1 Answers1

2

You are trying to access fun in your kernel, a member variable of Map. Member variables are accessed in C++ using the this pointer. Lambdas don't capture the this pointer by default in C++, hence the error message.

However, even if you were to capture this in your kernel it wouldn't work because this will point to host memory which in general is not accessible on device.

One very simple fix for this is usually to just use local copies in your kernel:

class X {
  void run(sycl::queue& q){
    q.submit([&](sycl::handler& cgh){
      int local_var = var; // Note: This can also be expressed using the lambda capture list
      cgh.parallel_for(..., [=](...){ /* use local_var here*/});
    });
  }

  int var;
};

Starting with C++17 you can also just capture the class by copy: [*this](...){...}.

The more fundamental problem with your code is that the SYCL specification does not allow the use of std::function inside device code. In some cases and for some SYCL implementations it might work (e.g. for host backends), but this is an extension. The problem is that the implementation of std::function typically uses mechanisms that cannot be supported on device for type erasure such as dynamic polymorphism.

One solution might be to include the type of the function in the class template arguments instead of using std::function.

illuhad
  • 506
  • 2
  • 5
  • Thank you for your response. the problem is that I tried to use local copy of my function like: auto f = fun; but still it throws an error which is : "kernel parameter has non-trivially copy constructible class/struct type 'std::function' " – Sahar Sa May 03 '21 at 15:50
  • As I've explained, it is not possible to use `std::function` in device code because it uses features which might not be supported on all devices that SYCL targets. – illuhad May 03 '21 at 15:57
  • could you guide me what should I do, please? – Sahar Sa May 03 '21 at 15:58
  • in the class template, what I have is out put and input type of my function. as you told I should add also a function type in template?! – Sahar Sa May 03 '21 at 16:00
  • Here are some ideas how you could implement this: https://godbolt.org/z/Yac8xPod1 Note: Please don't put `using namespace std; using namespace cl::sycl;` in your code. This is a recipe for disaster as there might be name collisions between e.g. `sycl::queue` and `std::queue`. Also note that the architecture of creating a new queue, buffers for every invocation of `operator()` is going to be inefficient if you are calling it multiple times. – illuhad May 03 '21 at 19:55
  • Thank you very much, it was really helpful for me. – Sahar Sa May 04 '21 at 14:51