2

In this question I'm interested in buffer-drawing in OpenGL, specifically in the tradeoff of using one buffer per data set vs one buffer for more than one data set.

Context:

Consider a data set of N vertices each represented by a set of attributes (e.g. color, texture, normals). Each attribute is represented by a type (e.g. GLfloat, GLint) and a number of components (2, 3, 4). We want to draw this data. Schematically,

(non-interleaved representation)

   data set
<-------------->
 a_1  a_2   a_3
<---><---><---->
a_i = attribute; e.g. a2 = (3 GLfloats representing color, thus 3*N Glfloats)

We want to map this into the GL state, using glBufferSubData.

Problem

When mapping, we have to keep track of the data in our memory because glBufferSubData requires a start and size. This sounds to me like an allocation problem: we want to allocate memory and keep track of its position. Since we want fast access to it, we would like the data to be in the same memory position, e.g. with a std::vector<char>. Schematically,

  data set 1    data set 2
<------------><-------------->
(both have same buffer id)

We commit to the gl state as:

// id is binded to one std::vector<char>, "data".
glBindBuffer(target, id);

// for each data_set (AFTER calling glBindBuffer).

// for each attribute

// "start": the start point of the attribute.
// "size":  (sizeof*components of the attribute)*N.
glBufferSubData(target, start, size, &(data[0]))

(non non-interleaved for the sake of the code).

the problem arises when we want to add or remove vertices, e.g. when LOD changes. Because each data set must be a chunk, for instance to allow interleaved drawing (even in non-interleaved, each attribute is a chunk), we will end up with fragmentation in our std::vector<char>.


On the other hand, we can also set one chunk per buffer: instead of assigning chunks to the same buffer, we assign each chuck, now a std::vector<char>, to a different buffer. Schematically,

  data set 1 (buffer id1)
<------------>
  data set 2 (buffer id2)
<-------------->

We commit data to the gl state as:

// for each data_set (BEFORE calling glBindBuffer).
// "data" is the std::vector<char> of this data_set.

// id is now binded to the specific std::vector<char>
glBindBuffer(target, id);

// for each attribute

// "start": the start point of the attribute.
// "size":  (sizeof*components of the attribute)*N.
glBufferSubData(target, start, size, &(data[0]))

Questions

I'm learning this, so, before any of the below: is this reasoning correct?

Assuming yes,

  1. Is it a problem to have an arbitrary number of buffers?
  2. Is "glBindBuffer" expected to scale with the number of buffers?
  3. What are the major points to take into consideration in this decision?
Jorge Leitao
  • 19,085
  • 19
  • 85
  • 121

1 Answers1

2

It is not quite clear if you asking about performance trade-offs. But I will answer in this key.

  1. Is it a problem to have an arbitrary number of buffers?

It is a problem came from a dark medieval times when pipelines was fixed and rest for now due to backward compatibility reasons. glBind* is considered as a (one of) performance bottleneck in modern OpenGL drivers, caused by bad locality of references and cache misses. Simply speaking, cache is cold and huge part of time CPU just waits in driver for data transferred from main memory. There is nothing drivers implementers can do with current API. Read Nvidia's short article about it and their bindless extensions proposals.

2. Is "glBindBuffer" expected to scale with the number of buffers?

Surely, the more objects (buffers in your case), more bind calls, more performance loss in driver. But merged, huge resource objects are less manageable.

3. What are the major points to take into consideration in this decision?

Only one. Profiling results ;) "Premature optimization is the root of all evil", so try to stay as much objective as possible and believe only in numbers. When numbers will go bad, we can think of:

"Huge", "all in one" resources:

  • less bind calls
  • less context changes
  • harder to manage and debug, need some additional code infrastructure (to update resource data for example)
  • resizing (reallocation) very slow

Separate resources:

  • more bind calls, loosing time in driver
  • more context changes
  • easier to manage, less error-prone
  • easy to resize, allocate, reallocate

In the end, we can see have performance-complexity trade-off and different behavior when update data. To stick one approach or another, you must:

  • decide, would you like to keep things simple, manageable or add complexity and gain additional FPS (profile in graphics profilers to know how much. Does it worth it?)
  • know how often you resize/reallocate buffers (trace API calls in graphics debuggers).

Hope it helps somehow ;)

If you like theoretical assertions like this, probably you will be interested in another one, about interleaving (DirectX one)

Community
  • 1
  • 1
Ivan Aksamentov - Drop
  • 12,860
  • 3
  • 34
  • 61