1

I have an OpenGL compute shader that generates an undefined number of vertices and stores them in a shader storage buffer (SSB). The SSB capacity is big enough so that the compute shader never generates a number of vertices that exceeds its capacity. I need the generated values to fill the buffer from the beginning and with no discontinuities (just like using push_back on a C++ vector). For that I'm using an atomic counter to count the index where to place the vertex values in the SSB when one is generated. This method seems to work but makes the compute shader run much more slower. Here is what the GLSL function looks like:

void createVertex(/*some parameters*/){
    uint index = atomicCounterIncrement(numberOfVertices);

    Vector vertex;
    // some processing that calculates the coordinates of the vertex

    vertices[index] = vertex;
}

Where vertices is a vec3 SSB defined by :

struct Vector
{
    float x, y, z;
};

layout (std430, binding = 1) buffer vertexBuffer
{
    Vector vertices[];
};

And numberOfVertices is an atomic counter buffer which value is initialized to 0 before running the shader.

Once the shader finished running I can load back the numberOfVertices variable on the CPU side to know the number of created vertices that are stored in the buffer in the range [0; numberOfVertices*3*sizeof(float)]. When measuring the time the shader took to run (with glBegin/EndQuery(GL_TIME_ELAPSED)), I get about 50ms. However when removing the atomicCounterIncrement line (and therefore also not assigning the vertex into the array) the measured time is around a few milliseconds. And that gap increases as I increase the number of workgroups.

I think the problem may be caused by the use of the atomic operation. So is there a better way to append values in an SSB ? In a way that would also give me the total number of added values once the shader has finished running ?

EDIT: After some refactoring and tests I noticed that it's actually the assignement of values inside the buffer (vertices[index] = vertex;) that slows all (about 40ms less when this line is removed). I should inform that the createVertex() function is called inside a for loop which number of loops is different between shader instances.

Krafpy
  • 336
  • 3
  • 8
  • If every shader invocation is definitely going to append a value, then there's no need for an atomic counter. – Nicol Bolas Aug 08 '20 at 13:38
  • @NicolBolas Each invocation can append between none to 12 vertices. – Krafpy Aug 08 '20 at 13:56
  • @Rabbid76 I know that vec3 are always stored as vec4, I just wrote that for the example, I'll edit that to be less confusing. In my real code I actually have a `struct Vector { float x, y, z }` that I use. My whole shader works, but is still is incredibly slow compared to what it should be. I noticed the problem comes from assigning the value inside the buffer. When removing this line the program goes from 60ms to 12ms. I have no idea how to corrected that. – Krafpy Aug 13 '20 at 16:54

0 Answers0