OpenCL barrier of a range of subgroups

Asked Sep 26 '19 at 14:50

Active Jun 28 '20 at 22:11

Viewed 276 times

In OpenGL 2.1 a work group is subdivided into subgroups. work_group_barrier() synchronizes all the work items in a work group, sub_group_barrier() only the work items in one subgroup.

Is it possible to synchronize the work items in a range of subgroups?

For example a work group consists of 5 subgroups, each containing 64 work items. Subgroups 0 and 1 (= work items 0 - 128) should synchronize, so that after the barrier work items from subgroup 0 can access data written by subgroup 1). At the same time subgroups 2, 3 and 4 could continue without participating in this sychronization, possibly executing a different part of code.

In CUDA this is possible for warps (equivalent of subgroup, 32 threads), using inline PTX assembly: CUDA: how to use barrier.sync

Is there a way to do this with OpenCL on the AMD platform, possibly using inline assembly code as well? If not, is there another GPGPU API/language for the AMD platform that would allow this?

edited Jun 28 '20 at 22:11

Peter Cordes

328,167
45
605
847

asked Sep 26 '19 at 14:50

tmlen

8,533
5
31
84

OpenCL barrier of a range of subgroups

0 Answers0