Warning: Long answer, lots of numbers.
Short version: It depends on your overlays whether the code below will almost double your framerate..
Looking at the posted code a couple of things come to mind:
As the color channels are bytes is seems to be more natural to treat them as such instead of all the masking and shifting, cheap as it may be..
you do quite a few calculations with oalpha
; unless you expect it to mostly be unequal 255 or 0 extra branches would save some multiplications..(6 per such pixel)
since it is not shown just how you call the routine you may already doing it, but this kind of thing begs for parallel processing; if you get 25fps on one core HD shouldn't be a problem on a multicore machine with even sonething as simple as a Parallel.For
will multiply your output..
Additionally there is the option of using Lockbits & Mashalling
instead of unsafe
; not sure if that'll be faster, but I guess I will write a benchmark to do some tests..
BTW: There is an error in your code, afaiks, I think you need to change this
*pOut = bOut | gOut << 8 | rOut << 16 | 0x00 << 24;
to this, or else the output has an alpha channel = 0
*pOut = (bOut | gOut << 8 | rOut << 16 ) | 0xff000000;
Or you may want to calculate the final alpha..
Update 1: First tests show your code to be a good deal faster (~2x) than a Lockbits & Mashalling` version, unless I messed it up..) so I'll ignore #4 from now on..
Update 2:
Preliminary numbers:
Running your code on the UI thread (!) of an i7-3770T 2.5GHz, W8.1 64
- QVGA_size (320x240) 666,7 fps
- NTSC_size (720x480) 161,3 fps
- HR_size (1280x720) 64,1 fps
- HD_size (1920x1080) 29,2 fps
Update 3:
Running DrawImage instead:
- QVGA_size (320x240) 641,0 fps
- NTSC_size (720x480) 194,2 fps
- HR_size (1280x720) 77,2 fps
- HD_size (1920x1080) 33,4 fps
using this code:
public void DrawImage(Bitmap overlay, Bitmap background, Bitmap output)
{
overlay.SetResolution(96, 96);
background.SetResolution(96, 96);
output.SetResolution(96, 96);
using (Graphics G = Graphics.FromImage(output) )
{
G.DrawImage(background, 0, 0);
G.CompositingMode = CompositingMode.SourceOver;
G.DrawImage(overlay, 0, 0);
}
}
Update 4:
I have now tried a few more things and can say
- using bytes instead of int32 makes the code cleaner imo, but doesn't change its speed so point #1 isn't important
- if all your pixels have alpha-blending and you will always do this kind of blending, using DrawImage will be only fractionally faster
- as for #2: Optimizing for alpha=0 and alpha=255 can make a huge difference, depending on the percentage of pixels with alpha-blending (ie pixels where 0 > alpha < 255), so unless most of your pixels will have an alpha-blending this kind of optimization can almost double the framerate:
public unsafe void OverlayImage3(Bitmap overlay, Bitmap background, Bitmap output)
{
Rectangle lrEntire = new Rectangle(new Point(), background.Size);
BitmapData bdBack = background.LockBits(lrEntire,
ImageLockMode.ReadOnly, background.PixelFormat);
BitmapData bdOverlay = overlay.LockBits(lrEntire,
ImageLockMode.ReadOnly, overlay.PixelFormat);
BitmapData bdOut = output.LockBits(lrEntire,
ImageLockMode.WriteOnly, output.PixelFormat);
byte* pBack = (byte*)bdBack.Scan0;
byte* pOverlay = (byte*)bdOverlay.Scan0;
byte* pOut = (byte*)bdOut.Scan0;
for (int luiToProcess = (bdBack.Height * bdBack.Stride) >> 2;
luiToProcess > 0; luiToProcess--)
{
//get each pixel component
byte red = *(pBack + 2);
byte green = *(pBack + 1);
byte blue = *(pBack + 0);
byte oalpha = *(pOverlay + 3);
byte ored = *(pOverlay + 2);
byte ogreen = *(pOverlay + 1);
byte oblue = *(pOverlay + 0);
//get each pixel color component
byte rOut, gOut, bOut;
if (oalpha == 255)
{ rOut = ored; gOut = ogreen; bOut = oblue; }
else if (oalpha == 0)
{ rOut = red; gOut = green; bOut = blue; }
else
{
rOut = (byte)((red * (255 - oalpha) + (ored * oalpha)) / 255);
gOut = (byte)((green * (255 - oalpha) + (ogreen * oalpha)) / 255);
bOut = (byte)((blue * (255 - oalpha) + (oblue * oalpha)) / 255);
}
*(pOut + 3) = 0xff;
*(pOut + 2) = rOut;
*(pOut + 1) = gOut;
*(pOut + 0) = bOut;
//move to the next pixel
pBack += 4; pOverlay += 4; pOut += 4;
}
A few more numbers:
- OverlayImage3 with 5% of all pixel having alpha blending
- QVGA_size (320x240) 1.282,1 fps
- NTSC_size (720x480) 320,5 fps
- HR_size (1280x720) 114,3 fps
HD_size (1920x1080) 52,1 fps
OverlayImage3 with 60% of all pixel having alpha blending
- QVGA_size (320x240) 917,4 fps
- NTSC_size (720x480) 256,4 fps
- HR_size (1280x720) 98,5 fps
HD_size (1920x1080) 46,7 fps
OverlayImage3 with 95% of all pixel having alpha blending
- QVGA_size (320x240) 714,3 fps
- NTSC_size (720x480) 220,8 fps
- HR_size (1280x720) 84,2 fps
- HD_size (1920x1080) 36,6 fps
DrawImage does profit from lack of alpha-blending, too:
Point #3, parallel processing will help aditionally, obviously, depending on your hardware.
Conclusion: I don't know your current resolution, but going from SD to HD will take 5-6x longer across all tests, so if you only just can do 25fps now you will need more than the code above; you'll need parallel processing, I'd say..