I have an OpenGL compute shader that generates an undefined number of vertices and stores them in a shader storage buffer (SSB). The SSB capacity is big enough so that the compute shader never generates a number of vertices that exceeds its capacity. I need the generated values to fill the buffer from the beginning and with no discontinuities (just like using push_back
on a C++ vector
). For that I'm using an atomic counter to count the index where to place the vertex values in the SSB when one is generated. This method seems to work but makes the compute shader run much more slower. Here is what the GLSL function looks like:
void createVertex(/*some parameters*/){
uint index = atomicCounterIncrement(numberOfVertices);
Vector vertex;
// some processing that calculates the coordinates of the vertex
vertices[index] = vertex;
}
Where vertices
is a vec3
SSB defined by :
struct Vector
{
float x, y, z;
};
layout (std430, binding = 1) buffer vertexBuffer
{
Vector vertices[];
};
And numberOfVertices
is an atomic counter buffer which value is initialized to 0 before running the shader.
Once the shader finished running I can load back the numberOfVertices
variable on the CPU side to know the number of created vertices that are stored in the buffer in the range [0; numberOfVertices*3*sizeof(float)]
.
When measuring the time the shader took to run (with glBegin/EndQuery(GL_TIME_ELAPSED)
), I get about 50ms. However when removing the atomicCounterIncrement
line (and therefore also not assigning the vertex into the array) the measured time is around a few milliseconds. And that gap increases as I increase the number of workgroups.
I think the problem may be caused by the use of the atomic operation. So is there a better way to append values in an SSB ? In a way that would also give me the total number of added values once the shader has finished running ?
EDIT: After some refactoring and tests I noticed that it's actually the assignement of values inside the buffer (vertices[index] = vertex;
) that slows all (about 40ms less when this line is removed). I should inform that the createVertex()
function is called inside a for loop which number of loops is different between shader instances.