I'm having an unusual problem while working on an openGL project. Essentially I require frame data in GRAYSCALE single channel format for some CV stuff. I'm using a custom shader, an FBO and PBO's to get the task done.
The flow of the program is as follows.
- bind the generated FBO
- draw() to the FBO
- bind PBO and glReadPixels()
- bind PBO from previous frame and glMapBufferRange()
- process the provided pixel data from glMapBufferRange()
I'd like to actually confirm that the process is working fine. What i'd like to know is whether there is anything that can be done to increase the performance. I'm going to post some of the code I'm using so we can all follow.
The PBO generator code
final int[] pbuffers = new int[2];
GLES30.glGenBuffers(2, pbuffers, 0);
for (int i = 0; i < pbuffers.length; i++) {
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbuffers[i]);
GLES30.glBufferData(GLES30.GL_PIXEL_PACK_BUFFER, width * height, null, GLES30.GL_DYNAMIC_READ);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
}
pbo_id[PBO_PRIMARY_ID] = pbuffers[0];
pbo_id[PBO_SECONDARY_ID] = pbuffers[1];
Step 3 from the list -> bind PBO and glReadPixels()
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[currentBuffer]);
GLES30.glReadBuffer(GLES30.GL_COLOR_ATTACHMENT0);
JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
final int prevBuffer = previousBuffer;
previousBuffer = currentBuffer;
currentBuffer = prevBuffer;
Step 4 from the list -> bind PBO from previous frame and glMapBufferRange(). This is the PBO which had glReadPixels performed from last frame.
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[currentBuffer]);
JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);
GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
And this is where the performance problem is coming from. Currently I'm reading back pixels which are 480 x 360 single channel grayscale (calculated from a shader). I've ran some benchmarks and results are below.
40-50ms -> JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);
0-1ms -> JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);
From what I understood is that glReadPixels from the PBO is not meant to be a blocking call, but for whatever reason it's blocking it here (and performing far worse than just reading from an FBO). It seems glMapBufferRange is behaving as expected, and returning the required data properly.
The only thing i can think of is that I'm using GL_RED and only reading back a single channel, but this still doesn't explain why glReadPixels is blocking.
Devices I've used for bench-marking (consistent behaviour).
- HTC One M8s (40-50ms)
- Nexus 5x (20-30ms)
- Google Pixel (15-30ms)
Any help in this matter would be highly appreciated! in the meantime, I'm going to try and experiment a bit more to see if there is anything obvious that i've missed.
EDIT -> 16/03/2017 (Added more code for clarity)
FBO Setup Code
final int[] values = new int[1];
GLES30.glGenTextures(1, values, 0);
GLES30.glBindTexture(GLES30.GL_TEXTURE_2D, values[0]);
// we only want GRAYSCALE / Single channel texture
GLES30.glTexImage2D(GLES30.GL_TEXTURE_2D, 0, GLES30.GL_R8, texWidth, texHeight, 0, GLES30.GL_RED, GLES30.GL_UNSIGNED_BYTE, null);
GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_WRAP_S, GLES30.GL_CLAMP_TO_EDGE);
GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_WRAP_T, GLES30.GL_CLAMP_TO_EDGE);
GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_MIN_FILTER, GLES30.GL_NEAREST);
GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_MAG_FILTER, GLES30.GL_NEAREST);
this.tex_id[0] = values[0];
GLES30.glGenFramebuffers(1, values, 0);
GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, values[0]);
this.fbo_id[0] = values[0];
GLES30.glFramebufferTexture2D(GLES30.GL_FRAMEBUFFER, GLES30.GL_COLOR_ATTACHMENT0, GLES30.GL_TEXTURE_2D, this.tex_id[0], 0);
final int status = GLES30.glCheckFramebufferStatus(GLES30.GL_FRAMEBUFFER);
if (status != GLES30.GL_FRAMEBUFFER_COMPLETE) {
Debug.LogError("Framebuffer incomplete. Status: " + status);
}
GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, 0);
The full render code. I've deconstructed as much of the logic and flow as possible for clarity.
// bind the offscreen FBO and render the current camera frame
GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, dualFBO.getID());
camera.draw(ShaderType.GRAYSCALE);
// ping-pong the FBO ID's
dualFBO.swap();
// dualFBO will now return the ID for last frame
GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, dualFBO.getID());
// bind the current PB and submit (meant to be async) glReadPixels
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, dualPBO.getID());
GLES30.glReadBuffer(GLES30.GL_COLOR_ATTACHMENT0);
// this call locks for 30-50ms... why? (meant to be async???)
JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
// ping-pong the PBO ID's.
dualPBO.swap();
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, dualPBO.getID());
// this call is instant
JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);
GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);