So I wrote a really simple OpenGL program to draw 100x100x100 points drawn as cubes using the Geometry Shader. I wanted to do it to benchmark it against what I could currently do using DirectX11.
With DirectX11, I can easily render these cubes at 60fps (vsync). However, with OpenGL I'm stuck at 40fps.
In both applications, I am:
- Using a point tolopology to represent just the position of the cube (stride = 12 bytes).
- Only mapping to the Vertex Buffer in the initialise function, only ever once.
- Using only two draw calls in total: one to render the cubes, one to render frametime.
- Using back-face culling, and depth testing.
- Limiting state changes to the minimum I need to draw the cubes (VBO's/Shader Program).
Here is my draw call:
GLboolean CCubeApplication::Draw()
{
auto program = m_ppBatches[0]->GetShaders()->GetProgram(0);
program->Bind();
{
glUniformMatrix4fv(program->GetUniform("g_uWVP"), 1, false, glm::value_ptr(m_matMatrices[MATRIX_WVP]));
glDrawArrays(GL_POINTS, 0, m_uiTotal);
}
return true;
}
This function calls glBindVertexArray and glUseProgram
program->Bind();
And the rest is straight-forward. My Update function does nothing but update the camera's position and view matrix, and is identical in DirectX/OpenGL versions.
My vertex shader is a pass-through, and my fragment shader returns a constant colour. This is my geometry shader:
#version 440 core
// GS_LAYOUT
layout(points) in;
layout(triangle_strip, max_vertices = 36) out;
// GS_IN
in vec4 vOut_pos[];
// GS_OUT
// UNIFORMS
uniform mat4 g_uWVP;
const float f = 0.1f;
const int elements[] = int[]
(
0,2,1,
2,3,1,
1,3,5,
3,7,5,
5,7,4,
7,6,4,
4,6,0,
6,2,0,
3,2,7,
2,6,7,
5,4,1,
4,0,1
);
// GS
void main()
{
vec4 vertices[] = vec4[]
(
g_uWVP * (vOut_pos[0] + vec4(-f,-f,-f, 0)),
g_uWVP * (vOut_pos[0] + vec4(-f,-f,+f, 0)),
g_uWVP * (vOut_pos[0] + vec4(-f,+f,-f, 0)),
g_uWVP * (vOut_pos[0] + vec4(-f,+f,+f, 0)),
g_uWVP * (vOut_pos[0] + vec4(+f,-f,-f, 0)),
g_uWVP * (vOut_pos[0] + vec4(+f,-f,+f, 0)),
g_uWVP * (vOut_pos[0] + vec4(+f,+f,-f, 0)),
g_uWVP * (vOut_pos[0] + vec4(+f,+f,+f, 0))
);
uint uiIndex = 0;
for(uint uiTri = 0; uiTri < 12; ++uiTri)
{
for(uint uiVert = 0; uiVert < 3; ++uiVert)
{
gl_Position = vertices[elements[uiIndex++]];
EmitVertex();
}
EndPrimitive();
}
}
I've seen people talk about instancing or other such rendering methods, but I'm primarily interested in understanding why I can't get at least the same performance from OpenGL as I do with DirectX - seeing as the way I do it in both seem to be virtually identical to me. Identical data, identical shaders. Help?
UPDATE So I downloaded gDEBugger, and here is my call stack for one frame:
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
// Drawing cubes
glBindVertexArray(1)
glUseProgram(1)
glUniformMatrix4fv(0, 1, FALSE, {matrixData})
glDrawArrays(GL_POINTS, 0, 1000000)
// Drawing text
glBindVertexArray(2);
glUseProgram(5);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, 2);
glBindBuffer(GL_ARRAY_BUFFER, 2);
glBufferData(GL_ARRAY_BUFFER, 212992, {textData}, GL_DYNAMIC_DRAW);
glDrawArrays(GL_POINTS, 0, 34);
// Swap buffers
wglSwapBuffers();