Poor performance of memcpy for data mapped with glMapBufferRange

Question

I am trying to read 4 megabytes from a pixel buffer mapped with glMapBufferRange using memcpy. My platform is Samsung Galaxy S7 Exynos (Mali GPU). The problem is very poor performance of reading. It takes about 75 milliseconds to copy the data.

I initialize buffers like this:

glGenBuffers(NUM_BUFFERS, pbo_id);
for (int i = 0; i < NUM_BUFFERS; i++) {
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[i]);
    glBufferData(GL_PIXEL_PACK_BUFFER, PBO_SIZE, 0, GL_DYNAMIC_READ);

    glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}

I read pixels to the buffers:

glReadBuffer(GL_COLOR_ATTACHMENT0);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[counter%NUM_BUFFERS]);
glReadPixels(0, 0, IMAGE_WIDTH, IMAGE_WIDTH, GL_RGBA, GL_UNSIGNED_BYTE, 0);

Then I read from the buffers using the following code:

glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[(counter - (NUM_BUFFERS-1)) % NUM_BUFFERS]);

void *ptr = glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, PBO_SIZE, GL_MAP_READ_BIT);

memcpy(&buffer, ptr, PBO_SIZE);

glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

I use previous buffers for reading to make sure that asynchronous operation finishes pbo_id[(counter - (NUM_BUFFERS-1)) % NUM_BUFFERS]

With just one buffer the time raises to 90 milliseconds, two and more allow to make the operation in 75 milliseconds.

I suppose that this is slow, because memcpu on other regions of memory finish in 1 or 2 milliseconds.

The documentation for glMapBufferRange has a note:

Mappings to the data stores of buffer objects may have nonstandard performance characteristics. For example, such mappings may be marked as uncacheable regions of memory, and in such cases reading from them may be very slow. To ensure optimal performance, the client should use the mapping in a fashion consistent with the values of GL_BUFFER_USAGE and access. Using a mapping in a fashion inconsistent with these values is liable to be multiple orders of magnitude slower than using normal memory.

So the question is what is wrong and how I can improve the performance of reading from buffer.

Just to check it's not stalling, does it go any faster if you use MAP_UNSYNCHRONIZED_BIT? — solidpixel, Jul 14 '17 at 10:09
@solidpixel, as I understand I cannot use it with MAP_READ_BIT flag for glMapBufferRange. At least the documentation says `This flag may not be used in combination with MAP_READ_BIT` — Alexander Ponomarev, Jul 14 '17 at 10:20
Sure, I was just trying understand where the time is going. What is the split between the `memcpy()` and the `glMapBufferRange()` call? If the `glMapBufferRange()` is stalling for a long time then it's possible you are just blocking and waiting for the GPU rendering to complete. — solidpixel, Jul 25 '17 at 10:54
@solidpixel memcpy is working slow. I think I found a solution here: https://community.arm.com/graphics/f/discussions/6657/how-to-gain-performance-through-pbo-pixel-buffer-object-on-mali-t-880 I just have not tried it yet — Alexander Ponomarev, Jul 25 '17 at 11:09

Poor performance of memcpy for data mapped with glMapBufferRange

0 Answers0