1

I have a shader program in Open GL ES. I want to adjust local / global workgroup sizes to complete 1-dimensional task with Compute Shader.

I have a total size of a task (total number of threads that can change between different runs), say [task_size]. Say I specify local workgroup size, let it be [local_size]. And also I know how many workgroups I have, say [workgroups]. I specify local size as here:

 layout(local_size_x = [local_size]) in;

And I specify number of workgroups in glDispatchCompute:

glDispatchCompute([workgroups], 1, 1);

If local_size * workgroups == task_size, I clearly understand what happens. Each part of the task is computed by separate group.

But what happens if task_size is not evenly divisible by local_size? I understand that minimum number of workgroups I need is task_size / local_size + 1. But how it works? Is last workgroup actually less than others? Does it affect performance? Is it a good idea to make a task_size evenly divisible by local_size?

BDL
  • 21,052
  • 22
  • 49
  • 55
sooobus
  • 841
  • 1
  • 9
  • 22

0 Answers0