1

I'm basing my tests on this popular PBO example (see pboUnpack.zip from http://www.songho.ca/opengl/gl_pbo.html). Tests are done on PBO Mode 1 per the example.

Running the original sample, I found that on my NVIDIA 560GTX PCIe x16 (driver v334.89 Win7 PRO x64 Core i5 Ivy Bridge 3.6GHz), glMapBufferARB() blocks for 15ms even when the glBufferDataARB() preceding was meant to prevent it from blocking (i.e. discard PBO).

I then changed the image size from the original 1024*1024 to 400*400, thinking surely it would reduce the blocking time. To my surprise, it remained at 15ms! CPU utilization remained high.

Experimenting further, I increased the image size to 4000*4000 and yet again I was surprised - glBufferDataARB reduced from 15ms to 0.1ms and CPU utilization reduced tremendously at the same time.

I am at a lost to explain what is going on here and I am hoping someone familiar with such issue could shed some light.

Code of interest:

// bind PBO to update pixel values
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]);

// map the buffer object into client's memory
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// for GPU to finish its job. To avoid waiting (stall), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, DATA_SIZE, 0, GL_STREAM_DRAW_ARB);
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB);
if(ptr)
{
    // update data directly on the mapped buffer
    updatePixels(ptr, DATA_SIZE);
    glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB); // release pointer to mapping buffer
}
Zach Saw
  • 4,308
  • 3
  • 33
  • 49
  • 1
    How do you measure the time `MapBuffer` is supposedly stalling? And the ~15ms sound to me like Sync to VBlank on 60Hz. The nvidia driver will buffer a few frames in advance, but it will finally stall at some point when you are rendering continously too fast. And that does not necessarily happen at the SwapBuffers call, so depending on how you measure, you might see it here. – derhass Apr 18 '14 at 13:19
  • To add to this: you have a GPU/driver combo capable of using OpenGL timer queries, you might consider looking into them. They will let you know how much time was actually spent on a command or sequence of commands in the pipeline, rather than any artificial stalling or minimal front-end overhead (*e.g.* some commands do little more than validation and then return control to the CPU, the actual bulk of the work might not occur until your next draw command) on the CPU-side. – Andon M. Coleman Apr 18 '14 at 15:12
  • @derhass Vsync doesn't really explain the huge speed up when an image size of 4000*4000 is used though, does it? – Zach Saw Apr 19 '14 at 00:59
  • @ZachSaw: well, one _could_ at least _construct_ some scenario in which it actually might. – derhass Apr 19 '14 at 02:20

0 Answers0