I've been working on project using OpenGL. Particles are rendered using instanced draw calls.
The issue is that sometimes glDrawElementsInstanced will not render anything. And no errors are reported. Other models and effects render fine. But no particles in my particle system will render. The draw call looks something like
ec(glBindVertexArray(vao));
ec(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo));
ec(glDrawElementsInstanced(GL_TRIANGLES, triangleElementIndices.size(), GL_UNSIGNED_INT, reinterpret_cast<void*>(0), instanceCount));
ec
is a macro used to e
rror c
heck opengl. it effectively does this:
while (GLenum error = glGetError()){
std::cerr << "OpenGLError:" << std::hex << error << " " << functionName << " " << file << " " << line << std::endl;
}
The issue rendering particles is more prevalent in Release mode, rather than debug mode; but occurs in both modes. The issue occurs about 8/10
in release mode and 1/10
in debug mode.
Below is the rendering process for particles: for each instanced drawcall...
- bind a shared vertex buffer object(vbo)
- put data into that vertex buffer object (vbo)
- iterate over many vertex array objects (vao), associate the VBO with them and set up vertex attributes
- render each vao
All of the objects share the same VBO, but the are rendered sequentially. The entire application is currently single threaded, so that shouldn't be an issue.
A given frame for particles A (two vaos), and B(one vao) would be like:
- -buffer A's data into vertex buffer named VBO
- -bind A_vao1
- -set up A's instance vertex attributes
- -bind A_vao2
- -set up A's instance vertex attributes
- -render A_vao1
- -render A_vao2
- -buffer B's data into vertex buffer name VBO (no glGenBuffers, this is same buffer)
- -bind B_vao1
- -set up B's instance vertex attributes
- -render B_vao1
Is there an obvious problem with that approach?
The source below has been simplified, but I left most of the relevant parts. Unlike what I have above, it actually uses 2 shared vertex buffer objects (VBOs), one for matrix4s, and one for vector4s.
GLuint instanceMat4VBO = ... //valid created vertex buffer objects
GLuint instanceVec4VBO = ... //valid created vertex buffer objects
//iterate over all the instnaces; data is stored in class EffectInstanceData
for(EffectInstanceData& eid : instancedEffectsData)
{
if (eid.numInstancesThisFrame > 0)
{
// ---- BUFFER data ---- before binding it to all VAOs (model's may have multiple meshes, each with their own VAO)
ec(glBindBuffer(GL_ARRAY_BUFFER, instanceMac4VBO)); //BUFFER MAT4 INSTANCE DATA
ec(glBufferData(GL_ARRAY_BUFFER, sizeof(glm::mat4) * eid.mat4Data.size(), &eid.mat4Data[0], GL_STATIC_DRAW));
ec(glBindBuffer(GL_ARRAY_BUFFER, instanceVec4VBO)); //BUFFER VEC4 INSTANCE DATA
ec(glBufferData(GL_ARRAY_BUFFER, sizeof(glm::vec4) * eid.vec4Data.size(), &eid.vec4Data[0], GL_STATIC_DRAW));
//meshes may have multiple VAO's that need rendering, set up buffers with instance data for each VAO before instance rendering is done
for (GLuint effectVAO : eid.effectData->mesh->getVAOs())
{
ec(glBindVertexArray(effectVAO));
{ //set up mat4 buffer
ec(glBindBuffer(GL_ARRAY_BUFFER, instanceMat4VBO));
GLsizei numVec4AttribsInBuffer = 4 * eid.numMat4PerInstance;
size_t packagedVec4Idx_matbuffer = 0;
//pass built-in data into instanced array vertex attribute
{
//mat4 (these take 4 separate vec4s)
{
//model matrix
ec(glEnableVertexAttribArray(8));
ec(glEnableVertexAttribArray(9));
ec(glEnableVertexAttribArray(10));
ec(glEnableVertexAttribArray(11));
ec(glVertexAttribPointer(8, 4, GL_FLOAT, GL_FALSE, numVec4AttribsInBuffer * sizeof(glm::vec4), reinterpret_cast<void*>(packagedVec4Idx_matbuffer++ * sizeof(glm::vec4))));
ec(glVertexAttribPointer(9, 4, GL_FLOAT, GL_FALSE, numVec4AttribsInBuffer * sizeof(glm::vec4), reinterpret_cast<void*>(packagedVec4Idx_matbuffer++ * sizeof(glm::vec4))));
ec(glVertexAttribPointer(10, 4, GL_FLOAT, GL_FALSE, numVec4AttribsInBuffer * sizeof(glm::vec4), reinterpret_cast<void*>(packagedVec4Idx_matbuffer++ * sizeof(glm::vec4))));
ec(glVertexAttribPointer(11, 4, GL_FLOAT, GL_FALSE, numVec4AttribsInBuffer * sizeof(glm::vec4), reinterpret_cast<void*>(packagedVec4Idx_matbuffer++ * sizeof(glm::vec4))));
ec(glVertexAttribDivisor(8, 1));
ec(glVertexAttribDivisor(9, 1));
ec(glVertexAttribDivisor(10, 1));
ec(glVertexAttribDivisor(11, 1));
}
}
}
{ //set up vec4 buffer
ec(glBindBuffer(GL_ARRAY_BUFFER, instanceVec4VBO));
GLsizei numVec4AttribsInBuffer = eid.numVec4PerInstance;
size_t packagedVec4Idx_v4buffer = 0;
{
//package built-in vec4s
ec(glEnableVertexAttribArray(7));
ec(glVertexAttribPointer(7, 4, GL_FLOAT, GL_FALSE, numVec4AttribsInBuffer * sizeof(glm::vec4), reinterpret_cast<void*>(packagedVec4Idx_v4buffer++ * sizeof(glm::vec4))));
ec(glVertexAttribDivisor(7, 1));
}
}
}
//activate shader
... code setting uniforms on shaders, does not appear to be issue...
//instanced render
for (GLuint vao : eid.effectData->mesh->getVAOs()) //this actually results in function calls to a mesh class instances, but effectively is doing this loop
{
ec(glBindVertexArray(vao));
ec(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo));
ec(glDrawElementsInstanced(GL_TRIANGLES, triangleElementIndices.size(), GL_UNSIGNED_INT, reinterpret_cast<void*>(0), instanceCount));
}
//clear data for next frame
eid.clearFrameData();
}
}
ec(glBindVertexArray(0));//unbind VAO's
Is any of this visibility wrong? I've debugged with RenderDoc
and when the issue is not present, a draw call is present in the event browser like the image:
But when the issue does happen, the draw call does not appear at all in RenderDoc
like the following image:
This seems very strange to me. I've verified with the debugger that the draw call is being executed. But it seems to silently fail.
I've tried debugging with nvidia nsight, but cannot reproduce it when launched through nvidia nsight.
I've verified
- instance VBO buffer size doesn't change or grow too large, its size is stable
- uniforms are be correctly finding values
- vao binding appears to happen in correct orderings
System specs: windows 10; Opengl3.3, 8gb memory; i7-8700k, NVIDIA GeForce GTX TITAN X
Also observed issue on on my laptop, with roughly same reproduction rates. It has an intel graphics chip.
github link to actual source if anyone tries to compile let me know, you need to replace the hidden .suo with the copy I made to automatically fill out the linker settings. function: ParticleSystem::handlePostRender