1

In OpenGL 2.1 a work group is subdivided into subgroups. work_group_barrier() synchronizes all the work items in a work group, sub_group_barrier() only the work items in one subgroup.

Is it possible to synchronize the work items in a range of subgroups?

For example a work group consists of 5 subgroups, each containing 64 work items. Subgroups 0 and 1 (= work items 0 - 128) should synchronize, so that after the barrier work items from subgroup 0 can access data written by subgroup 1). At the same time subgroups 2, 3 and 4 could continue without participating in this sychronization, possibly executing a different part of code.

In CUDA this is possible for warps (equivalent of subgroup, 32 threads), using inline PTX assembly: CUDA: how to use barrier.sync

Is there a way to do this with OpenCL on the AMD platform, possibly using inline assembly code as well? If not, is there another GPGPU API/language for the AMD platform that would allow this?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
tmlen
  • 8,533
  • 5
  • 31
  • 84

0 Answers0