port SYCL/DPC++ code originally written for GPUs to FPGAs

Asked Jun 12 '22 at 23:16

Active Jun 17 '22 at 15:00

Viewed 103 times

I'm kinda new to the world of FPGAs and I'm trying to port some code written for GPUs to FPGAs, to compare the performances.

From my understanding, using parallel_for ain't a good practice (in fact it runs very slow), instead (I think) I should use a single_task and an unrolled for loop. I'm struggling to make it work properly though.

So, I have

q.submit([&](sycl::handler &h){
   h.parallel_for<class Foo>(sycl::nd_range<1>(n_blocks * n_threads, n_threads),
          [=](auto& it) {
              some_kernel(it, <other params here ...> );
          });
}).wait();

and my attempt is

q.submit([&](sycl::handler &h){
   h.single_task<class Foo>(
     #pragma unroll
     for(int i = 0; i < n_blocks * n_threads; ++i)
        some_kernel(...)
   );
}).wait();

But I'm not sure how to adapt what I was previously doing with a sycl::item (for instance, how to use the loop index to replace the calls to the methods get_group, get_local_id? ).

Should I entirely change the design of the kernel ? In other word, is the "work_groups - work_group_size" approach not appropriate with FPGAs ?

edited Jun 17 '22 at 15:00

asked Jun 12 '22 at 23:16

Elle

port SYCL/DPC++ code originally written for GPUs to FPGAs

0 Answers0