2

I wrote this code to remove annoying patterns in a video due to camera malfunction. The problem is that to encode a 2 Minute video this algorithm needs more than 2 hours. I want to significantly reduce the time needed.

The algorithm iterates over each image, looks at each 4 pixels from there, creates the average and if the average is below a threshold, sets the current pixel to white. I could use step = 2 and set the 2x2 matrix to white, yet this deteriorates images quality and only increases speed by half.

I already added the lockbitmap, lockbits and improved the auxiliary functions.

Before and after the using (d2 = reader.ReadVideoFrame())-snippet I have an aforge videoreader and writer based on ffmpeg.

using (d2 = reader.ReadVideoFrame())
{
    LockBitmap lockBitmap = new LockBitmap(d2);
    lockBitmap.LockBits();

    Color v = Color.FromArgb(240, 237, 241);
    for (int x = 0; x < lockBitmap.Width-1; x = x + 1)
    {
        for (int y = 0; y < lockBitmap.Height-1; y = y + 1)
        {
            Color dus = durchschnitt(lockBitmap.GetPixel(x, y),
                lockBitmap.GetPixel(x + 1, y),
                lockBitmap.GetPixel(x, y + 1),
                lockBitmap.GetPixel(x + 1, y + 1));
            if (abstand(dus, v) < 50)
            {
                lockBitmap.SetPixel(x, y, Color.White);
            }
        }
    }
    lockBitmap.UnlockBits();
}

Auxiliary functions:

private Color durchschnitt(Color c1, Color c2, Color c3, Color c4)
{
    return Color.FromArgb((int)((c1.R + c2.R + c3.R + c4.R) / 4),
        (int)((c1.G + c2.G + c3.G + c4.G) / 4),
        (int)((c1.B + c2.B + c3.B + c4.B) / 4));
}

and

private double abstand(Color c1, Color c2)
{
    return Math.Sqrt(Math.Pow(c2.R - c1.R, 2) + Math.Pow(c2.G - c1.G, 2) + Math.Pow(c2.B - c1.B, 2));
}

LockBitmap is from here.

Fabien Biller
  • 155
  • 1
  • 1
  • 10

1 Answers1

1

This is not how lockBits work.

In short you need to lock the bits to get access to the scanline via pointers. Best to use 32bpp for integer access. You can calculate the pixel in the contiguous array as follows.

You will need to decorate you class or your method with the unsafe keyword and also set the project build option to use unsafe code, you use pointers.

var w = bmp.Width;
var h = bmp.Height;

// lock the array for direct access
var data = bmp.LockBits(new Rectangle(0, 0, w, h), ImageLockMode.ReadWrite, PixelFormat.Format32bppPArgb);
var scan0Ptr = (int*)data.Scan0;

// get the stride
var stride = data.Stride / 4;

// scan all x
for (var x = 0; x < w; x++)
{
   var pX = scan0Ptr + x;

   // scan all y
   for (var y = 0; y < h; y++)
   {
      // this is now your pixel *p, which essentially is a pointer to
      // to a memory which contains your pixel
      var p = pX + y * stride;

      // or for better access to X and Y
      var p = scan0Ptr + x + y * stride;

      // or alternatively you can just access you pixel with the following notation  
      *(scan0Ptr + x + y * stride) // <== just use this like any variable 

      // example how to convert pixel *p to a Color
      var color = Color.FromArgb(*p);

      // example Convert a Color back to *p and update the pixel
      *p = Color.White.ToArgb();
   }
}
// unlock the bitmap
bmp.UnlockBits(data);

I'll leave the rest of the details up to you, however this will give you the best performance (minus some micro optimizations).

Lastly reducing the calls to external methods will also give you a performance boost; however, if you really need to, you can help the CLR by using MethodImplOptions Enum for AggressiveInlining

E.g

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private Color durchschnitt(Color c1, Color c2, Color c3, Color c4)

Also you might get a performance boost, by shifting out the components:

var r = ((*p >> 16) & 255);
var g = ((*p >> 8) & 255);
var b = ((*p >> 0) & 255);

And you could probably multi-thread this workload.

halfer
  • 19,824
  • 17
  • 99
  • 186
TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • By using your recommendations, I reached 1 hour for 2:10 1920 x 1080 Video at 52.26fps. Yes, `unsafe` is necessary. – Fabien Biller Oct 14 '18 at 01:05
  • @FabienBiller the next step would be get this happening on multiple cores – TheGeneral Oct 14 '18 at 01:06
  • I do have 8 and cuda. Do you have ideas? – Fabien Biller Oct 14 '18 at 01:07
  • 1
    Maybe best to ask another question, but yeah definitely threaded or gpu would probably eat this – TheGeneral Oct 14 '18 at 01:08
  • @FabienBiller Also id start micro optimizing it, you could knock a lot off by making sure every line of code is efficient in the loop, ie there is a lot of conversions to and from color ect – TheGeneral Oct 14 '18 at 01:10
  • now I am at 49 minutes having removed all `FromArgb`. – Fabien Biller Oct 14 '18 at 01:20
  • @FabienBiller now i think you can smash it in half or more with multiple threads, or even more on the cuda – TheGeneral Oct 14 '18 at 01:26
  • With Parallel.Foreach I brought it down to 12 minutes. Let's see, what CUDA can do. BTW: How many elements, I use a List, should be in it at once? I am currently using 30. It makes my CPU peak, memory is fine though. – Fabien Biller Oct 14 '18 at 02:31
  • @FabienBiller The MaxDegreeOfParallelism property affects the number of concurrent operations run by Parallel method calls that are passed this ParallelOptions instance. A positive property value limits the number of concurrent operations to the set value. If it is -1, there is no limit on the number of concurrently running operations. – TheGeneral Oct 14 '18 at 02:35
  • By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used. – TheGeneral Oct 14 '18 at 02:35
  • @FabienBiller So dont limit what goes into the for loop unless you have memory problems, and dont set the MaxDegreeOfParallelism, however experiment and see what works best for you – TheGeneral Oct 14 '18 at 02:43