I am developing OpenCL code for different devices. At the current time I work with Rockchip RK3588 (OpenCL device - Mali-G610 r0p0). The program algorithm was originally written on CUDA, where the warp size is 32. In OpenCL this value is named "sub-work group size" (count Work-Items running in the current time). Also, this value can get from the value CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
For example on Intel GPU I can set this value uses __attribute__((intel_reqd_sub_group_size(32)))
. And now on "Mali-G610 r0p0" I get "CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE = 16", but the program work is not correct, I need to change this value to 32.
<clinfo> returned me the next info:
................
Preferred work group size multiple (device) 16
Preferred work group size multiple (kernel) 16
Max sub-groups per work group 64
................
Perhaps someone can help me with this?