Questions tagged [sycl]

SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++.

SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL is developed by the Khronos group.

SYCL single-source programming enables the host and kernel code for an application to be contained in the same source file, in a type-safe way and with the simplicity of a cross-platform asynchronous task graph. SYCL includes templates and generic lambda functions to enable higher-level application software to be cleanly coded with optimized acceleration of kernel code across the extensive range of shipping OpenCL 1.2 implementations.

Developers program at a higher level than OpenCL C or C++, but always have access to lower-level code through seamless integration with OpenCL, C/C++ libraries, and frameworks such as OpenCV™ or OpenMP™.

Implementations of SYCL include ComputeCpp and triSYCL.

151 questions
0
votes
1 answer

How to reduce the time cost of parallel_for in DPC++?

I've wrote the following code in DPC++ to test time consumption. // ignore sth for defining subdevices cl::sycl::queue q[4] = {cl::sycl::queue{SubDevices1[0]}, cl::sycl::queue{SubDevices1[1]}, cl::sycl::queue{SubDevices2[0]},…
lastans7
  • 143
  • 5
0
votes
1 answer

Trying to implement 2d array addition. in DPC++

I am learning dpc++ and trying to implement 2d array matrix program. I am stuck in between the program. Please check the blow code and support me. Need help. #include #include #define N 2 using namespace sycl; int main(){ int…
0
votes
0 answers

gpu_selector is giving runtime error like CL_DEVICE_NOT_FOUND, in sycl dpc++

In my Ubuntu 20.04 version, we have installed intel one API dpc++. The version is; Intel(R) one API DPC++/C++ Compiler 2022.1.0 (2022.1.0.20220316) Where we have nvidia gpu. (Got to know by command: nvidia-smi) There we have NVIDIA GeForce ...…
0
votes
1 answer

cannot capture the struct value inside of the kernal function

It is so strange and I am struggling with this problem for the whole week. I just want to use the variable which is defined inside of the struct constructor, but fail to do that. The simple code is here: #include #include…
0
votes
1 answer

Weird behavior of dpc++ code after running it on FPGA device

I am using DPC++ to accelerate knn algorithm on FPGA device. The following code is the code I wrote for the euclidean distance. The problem is that the fpga_emulation works very well with no problems while running it on fpga hardware (Intel Arria 10…
Amal Taha
  • 3
  • 4
0
votes
0 answers

Parallel for is very slow compared to iterative solution

I am trying to accelerate an algorithm using DPC++. What happens is that the normal calculations takes 1.5 times faster than kernel parallel execution. The following code is for both calculations. the num_items currently equals 16,000. I tried small…
Amal Taha
  • 3
  • 4
0
votes
1 answer

SYCL kernel cannot call an undefined function without SYCL_EXTERNAL attribute

I am trying to calculate the euclidean distance for KNN but in parallel using dpc++. the training dataset contains 5 features and 1600 rows, while I want to calculate the distance between the current test point and each training point on the grid in…
0
votes
0 answers

DPC++ access the the nonconst size buffer or access the shared memory pointer in class using MPI

I try to develop a code based on MPI & DPC++ for large-scale simulation. The problem can be summarized as: I want to declare the data size, allocate the data memory inside of my class constructor, and then try to use them in the functions inside of…
0
votes
1 answer

dpc++ start the do loop from 1 to n-2 using parallel_for range

Is that possible to start the do loop and the index is from 1 to n-2 using dpc++ parallel_for? h.parallel_for(range{lx , ly }, [=](id<2> idx this will give a do loop from 0 to lx-1, and I have to do idx[0]>0 && idx[1]>0 && idx[0]
0
votes
1 answer

DPC++ & MPI, buffer, shared memory, variable declare

I am new to DPC++, and I try to develop a MPI based DPC++ Poisson solver. I read the book and am very confused about the buffer and the pointer with the shared or host memoery. What is the difference between those two things, and what should I use…
0
votes
1 answer

QtCreator with Intel OneAPI SYCL

I started my study with OneAPI SYCL but I normally use QtCreator as my IDE. I did a HelloSYCL project with CMake and works fine in the terminal and in the VSCode with OneAPI Extension as well, but didn't work in the QtCreator. Every time I want to…
0
votes
1 answer

SYCL program working using VS Debugger but not when running the .exe

I am trying to build and run a simple SYCL program from this book. Here it is: #include #include using namespace sycl; const std::string secret { "Ifmmp-!xpsme\"\012J(n!tpssz-!Ebwf/!" …
Balfar
  • 125
  • 7
0
votes
2 answers

Intel MKL ERROR: incorrect parameter when calling gemm()

I have this code: void my_function(double *image_vector, double *endmembers, double *abundanceVector, int it, int lines, int samples, int bands, int targets) { double *h_Num; double *h_aux; double *h_Den; int lines_samples =…
gamersensual
  • 105
  • 6
0
votes
1 answer

Optimize member function selection at runtime on CPU/GPU

I have the following piece of code that needs to optimized (and be later ported to the GPU through SYCL or ArrayFire): struct Item { float value; int f; float Func(float); float Func1(float); float Func2(float); float…
Pietro
  • 12,086
  • 26
  • 100
  • 193
0
votes
1 answer

Declaring Half precision floating point memory in SYCL

I would like to know and understand how can one declare half-precision buffers and pointers in SYCL namely in the following ways - Via the buffer class. Using malloc_device() function. Also, suppose I have an existing fp32 matrix / array on the…
Atharva Dubey
  • 832
  • 1
  • 8
  • 25