I have a shader program in Open GL ES. I want to adjust local / global workgroup sizes to complete 1-dimensional task with Compute Shader.
I have a total size of a task (total number of threads that can change between different runs), say [task_size]. Say I specify local workgroup size, let it be [local_size]. And also I know how many workgroups I have, say [workgroups]. I specify local size as here:
layout(local_size_x = [local_size]) in;
And I specify number of workgroups in glDispatchCompute:
glDispatchCompute([workgroups], 1, 1);
If local_size * workgroups == task_size
, I clearly understand what happens. Each part of the task is computed by separate group.
But what happens if task_size is not evenly divisible by local_size? I understand that minimum number of workgroups I need is task_size / local_size + 1
. But how it works? Is last workgroup actually less than others? Does it affect performance? Is it a good idea to make a task_size evenly divisible by local_size?