I am trying to read 4 megabytes from a pixel buffer mapped with glMapBufferRange
using memcpy. My platform is Samsung Galaxy S7 Exynos (Mali GPU). The problem is very poor performance of reading. It takes about 75 milliseconds to copy the data.
I initialize buffers like this:
glGenBuffers(NUM_BUFFERS, pbo_id);
for (int i = 0; i < NUM_BUFFERS; i++) {
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[i]);
glBufferData(GL_PIXEL_PACK_BUFFER, PBO_SIZE, 0, GL_DYNAMIC_READ);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}
I read pixels to the buffers:
glReadBuffer(GL_COLOR_ATTACHMENT0);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[counter%NUM_BUFFERS]);
glReadPixels(0, 0, IMAGE_WIDTH, IMAGE_WIDTH, GL_RGBA, GL_UNSIGNED_BYTE, 0);
Then I read from the buffers using the following code:
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_id[(counter - (NUM_BUFFERS-1)) % NUM_BUFFERS]);
void *ptr = glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, PBO_SIZE, GL_MAP_READ_BIT);
memcpy(&buffer, ptr, PBO_SIZE);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
I use previous buffers for reading to make sure that asynchronous operation finishes pbo_id[(counter - (NUM_BUFFERS-1)) % NUM_BUFFERS]
With just one buffer the time raises to 90 milliseconds, two and more allow to make the operation in 75 milliseconds.
I suppose that this is slow, because memcpu
on other regions of memory finish in 1 or 2 milliseconds.
The documentation for glMapBufferRange has a note:
Mappings to the data stores of buffer objects may have nonstandard performance characteristics. For example, such mappings may be marked as uncacheable regions of memory, and in such cases reading from them may be very slow. To ensure optimal performance, the client should use the mapping in a fashion consistent with the values of GL_BUFFER_USAGE and access. Using a mapping in a fashion inconsistent with these values is liable to be multiple orders of magnitude slower than using normal memory.
So the question is what is wrong and how I can improve the performance of reading from buffer.