Split VBO to smaller buffers?

Question

OpenGL newbie here. I am learning OpenGL and I'm trying to figure out workaround to VBO. Let me explain. If You load .obj file you have everything set up: vertex coordinates, normals, UV's and their indices. Why is this not enough to draw triangles on the screen? The reason why I am asking this is that VBO creation is the biggest bottleneck in the pipeline. For instance, I can create procedural sphere data with million vertices, its normals, UV's on CPU and split each of these in their own buffers/arrays. I can do this using multithreading and SIMD instructions. So basically now, when everything is setup I have to interleave this data for the GPU and this would take much more time than actually creating the organized data in the first place. I can't even use multithreading because the data in VBO is being repeated. For instance, cube with 8 vertex coords has 24 normals so I have to repeat each of these 8 vertices 3 times in the VBO. What is the exact reason why do we have to use VBO? Wouldn't splitting it in smaller buffers for coords, normals and uvs actually reduce memory usage and improve performance?

"VBO creation is the biggest bottleneck in the pipeline" are you creating VBOs each frame? you should create them only once at the beginning of the application. Then each frame you just "use" the VBOs. — tuket, Mar 03 '21 at 14:12
I am talking about mesh that should/could be edited multiple times. A mesh that is being created/edited in Maya/Blender, not a static mesh that is just imported in game engine. To actually create or edit this kind of mesh there should be several buffers that user can modify. Something like: 1) vtx_pos, 2) normals 3) uvs... and when that is done then this data should be interleaved in VBO to actually be drawn on the screen. — Nemanja Stojanovic, Mar 03 '21 at 14:58
So if user wants get data from vtx_pos buffer, apply some kind of modifier and set back the data that could be done lightning fast, using multithreading and SIMD instructions. But then we come to a point in which we have to organize this data in VBO and that actually becomes a bottleneck. This setup/organization of data is actually much slower than actually doing the "hard stuff". I hope I've explained it a bit better. — Nemanja Stojanovic, Mar 03 '21 at 14:58
You have to use a VBO because you have to use a buffer, and any buffer used for vertex data is called a VBO, that's what the "V" and "B" stand for. There's no way to render vertices from a buffer (which is an object) without having a VBO. — user253751, Mar 03 '21 at 15:29
Please provide some code that can be used to outline/demonstrate the problem. There shouldn't be any need to constantly recreate VBOs. If you need to update buffer data why not use [`glBufferSubData`](https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBufferSubData.xhtml) for example. — G.M., Mar 03 '21 at 15:29
Basically, I am creating procedural geometry. Everything is created from scratch, vtx_coords, normals, uvs, textures. The problem is that all that heavy calculation is about 5% of the execution time. 95% is spent on VBO setup and actually drawing the object on the screen. I had this issue in Maya, then I had this exact same issue in Qt 3D once I started porting my tools to it and now I've figured out that the problem is all the way under the hood with OpenGL. If this is the only format that GPU can understand, than I'm stack. VBO reorganization just becomes massive bottleneck for me. — Nemanja Stojanovic, Mar 03 '21 at 15:50
OpenGL can understand any format, actually. It is not a requirement to have data interleaved. You could have 3 different VBOs (pos, normal, texCoord) and then use `glBindVertexBuffer & glVertexAttribPointer` to tell OpenGL from which VBO should it read the attribute from. That way you wouldn't need to do the data format transformation. Would that fix your problem? — tuket, Mar 03 '21 at 17:05
Actually, you don't need 3 different VBOs, you could have only one: first you have all positions, then all normals, then all texCoords. In `glVertexAttribPointer` you would specify the offset to the start of the attrib data as the last parameter `pointer`: https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glVertexAttribPointer.xhtml — tuket, Mar 03 '21 at 17:16
Awesome. That is exactly what I was looking for! Edit: I don't know how to mark your comment as an answer. — Nemanja Stojanovic, Mar 03 '21 at 17:17
However, I would still have to test if these "segments" in main buffer have to match. Like, do I have to repeat 8 vertices in cube 3 times each to match 24 normals, because I would use unique index buffer. Ideal solution would be to have 3 buffers for coords, normals and uvs and 3 index buffers for each. So these first 3 don't have to match in size, but obviously index buffers have to. — Nemanja Stojanovic, Mar 03 '21 at 17:23
You would have 3 VBOs but only one EBO(index buffer). I will write an answer with an example — tuket, Mar 03 '21 at 17:39

tuket · Answer 1 · 2021-03-03T21:33:45.643

Currently you are interleaving vertex data in CPU but this is not a requirement. You can provide OpenGL with any data format you like have by using glVertexAttribPointer.

Example 1: using 3 VBOs

float positions[] = {...};
float normals[] = {...};
float texCoords[] = {...};
unsigned indices[] = {...};

GLuint vbo[3];
glGenBuffers(3, vbo);
// positions
glBindBuffer(GL_ARRAY_BUFFER, vbo[0])
glBufferData(GL_ARRAY_BUFFER, sizeof(positions), positions, GL_STREAM_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, nullptr);
// normals
glBindBuffer(GL_ARRAY_BUFFER, vbo[1])
glBufferData(GL_ARRAY_BUFFER, sizeof(normals), normals, GL_STREAM_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, nullptr);
// tex coords
glBindBuffer(GL_ARRAY_BUFFER, vbo[2])
glBufferData(GL_ARRAY_BUFFER, sizeof(texCoords), texCoords, GL_STREAM_DRAW);
glEnableVertexAttribArray(2);
glVertexAttribPointer(3, 2, GL_FLOAT, GL_FALSE, 0, nullptr);
// indices
GLuint ebo;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STREAM_DRAW);

// draw
glDrawElements(GL_TRIANGLES, numInds, GL_UNSIGNED_BYTE, inds);

Example 2: using a single VBO

float positions[] = {...};
float normals[] = {...};
float texCoords[] = {...};
unsigned indices[] = {...};

GLuint vbo;
glGenBuffers(1, &vbo);
// positions
glBindBuffer(GL_ARRAY_BUFFER, vbo[0])
const int totalSize = sizeof(positions) + sizeof(normals) + sizeof(texCoords);
glBufferData(GL_ARRAY_BUFFER, totalSize, nullptr, GL_STREAM_DRAW); // notice we are not passing any data yet(nullptr). We are just allocating the space for now
int offset = 0;
// positions
glBufferSubData(GL_ARRAY_BUFFER, offset, sizeof(positions), positions); // here we upload the data
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, (void*)offset);
offset += sizeof(positions);
// normals
glBufferSubData(GL_ARRAY_BUFFER, offset, sizeof(normals), normals);
glEnableVertexAttribArray(1);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, (void*)offset);
offset += sizeof(normals);
// texCoords
glBufferSubData(GL_ARRAY_BUFFER, offset, sizeof(texCoords), texCoords);
glEnableVertexAttribArray(2);
glVertexAttribPointer(3, 2, GL_FLOAT, GL_FALSE, 0, (void*)offset);

// the code below didn't change
// indices
GLuint ebo;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STREAM_DRAW);

// draw
glDrawElements(GL_TRIANGLES, numInds, GL_UNSIGNED_BYTE, inds);

I haven't tested any of the code so there might be typos. Use VAOs if you can. Remember to free resources when you are done with them!

About repeating positions for the case of a cube:

In is true that if you want to draw a cube, you are going to repeat the positions 3 times (one for each adjacent face), because the normal needs to be different.

Fist of all, I have to make clear that, in many applications, having to repeat the position is not going to happen. That is because flat shading, is not desired.

Image from: http://www.faculty.jacobs-university.de/llinsen/teaching/320322_Fall2009/lecture13.pdf

Most commonly, we want stuff to look smooth. Simple geometric shapes as the cube are an exception so people don't bother to do anything complicated and just repeat position data (simple geometric shapes are usually cheap anyways).

Another way of thinking of it: VBO, as the name implies, is data that is assigned to a vertex. When you consider the case of a cube, the normal that you want to specify is for a face, not for a vertex. As far as I know, mesh data formats, such as .obj, don't allow per face data either.

With all that said, if you still consider that for you application saving those repeated data is worth the effort, there might a way to do it in OpenGL using geometry shaders. Geometry shaders will allow you to emit primitives as you want.

This is an example of a geometry shader. It doesn't take any normals, the normal is computed in the shader itself once per primitive.

#version 330 core

layout (triangles) in; // in glDrawElements we specify GL_TRIANGLES
layout (triangles, max_vertices = 3) out; // and we still want to output triangles

// these are the inputs we get from the vertex shader (only the texCoord, apart from the position)
in VS_OUT {
    //vec3 normal; // we don't have an input for the normal, we are going to compute it from the vertex positions
    vec2 texCoord;
} vs_out[];

// these outputs will be received in the fragment shader 
out GS_OUT {
    flat vec2 normal; // we use the flat modifier because we don't need interlotation :) https://www.khronos.org/opengl/wiki/Type_Qualifier_(GLSL)#Interpolation_qualifiers
    vec3 texCoord;
} gs_out;

void main()
{
    vec3 p0 = gl_in[0].gl_Position.xyz;
    vec3 p1 = gl_in[1].gl_Position.xyz;
    vec3 p2 = gl_in[2].gl_Position.xyz;
    // compute the normal with the cross product
    gs_out.normal = normalize(cross(p1-p0, p2-p0));

    gl_Position = gl_in[0].gl_Position; 
    gs_out.texCoord = vs_out[0].texCoord;
    EmitVertex();

    gl_Position = gl_in[1].gl_Position;
    gs_out.texCoord = vs_out[1].texCoord;
    EmitVertex();

    gl_Position = gl_in[2].gl_Position;
    gs_out.texCoord = vs_out[2].texCoord;
    EmitVertex();
    
    EndPrimitive();
}

I think you could go even further and not compute the normal for each triangle, only for every quad. But that is tricky!

Thank You! I have marked the answer as useful because this is probably as good as it gets. However, I cannot mark it as a solution because this still looks to me like a major flaw in OpenGL. I still have to repeat coordinates in positions because I can use only one index buffer. This implies that I have to have 1 positions buffer for the CPU side and 1 position buffer for the GPU side that are different. This is not just a memory issue but a performance issue. — Nemanja Stojanovic, Mar 03 '21 at 18:54
@NemanjaStojanovic No problem. Ah, now I understand better your concern with repeating positions, I have updated my answer. — tuket, Mar 03 '21 at 20:36
Thanks. I think You've covered all of my concerns! I really should check geometry shaders. — Nemanja Stojanovic, Mar 03 '21 at 21:23
@NemanjaStojanovic I have added one example of what the geometry shader might look like — tuket, Mar 03 '21 at 21:25

Split VBO to smaller buffers?

1 Answers1