I need to blend thousands of pairs of images very fast.
My code currently does the following: _apply is a function pointer to a function like Blend. It is one of the many functions we can pass, but it is not the only one. Any function takes two values and outputs a third and it is done on each channel for each pixel. I would prefer a solution that is general to any such function rather than a specific solution for blending.
typedef byte (*Transform)(byte src1,byte src2);
Transform _apply;
for (int i=0 ; i< _frameSize ; i++)
{
source[i] = _apply(blend[i]);
}
byte Blend(byte src, byte blend)
{
int resultPixel = (src + blend)/2;
return (byte)resultPixel;
}
I was doing this on CPU but the performance is terrible. It is my understanding that doing this in GPU is very fast. My program needs to run in computers that will have either Nvidia GPUs or Intel GPUs so whatever solution I use needs to be vendor independent. If I use GPU it has to be OpenGL to be platform independent as well.
I think using a GLSL pixel shader would help, but I am not familiar with pixel shaders or how to use them to 2D objects (like my images).
Is that a reasonable solution? If so, how do I do this in 2D? If there is a library that already does that it is also great to know.
EDIT: I am receiving the image pairs from different sources. One is always coming from a 3d graphics component in opengl (so it is in GPU originally). The other one is coming from system memory, either from a socket (in a compressed video stream) or from a memory mapped file. The "sink" of the resulting image is the screen. I am expected to show the images on the screen, so going to GPU is an option or using something like SDL to display them.
The blend function that is going to be executed the most is this one
byte Patch(byte delta, byte lo)
{
int resultPixel = (2 * (delta - 127)) + lo;
if (resultPixel > 255)
resultPixel = 255;
if (resultPixel < 0)
resultPixel = 0;
return (byte)resultPixel;
}
EDIT 2: The image coming from GPU land, comes in this fashion. From FBO to PBO to system memory
glBindFramebuffer(GL_FRAMEBUFFER,fbo);
glReadBuffer( GL_COLOR_ATTACHMENT0 );
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glReadPixels(0,0,width,height,GL_BGR,GL_UNSIGNED_BYTE,0);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
void* mappedRegion = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
Seems like it is probably better to just work everything in GPU memory. The other bitmap can come from system memory. We may get it from a video decoder in GPU memory eventually as well.
Edit 3: One of my images will come from D3D while the other one comes from OpenGL. It seems that something like Thrust or OpenCL is the best option