Demosaicing algorithm that contains downsampling

Question

Introduction: What I am working on.

Hello everyone! I am working on a Demosaicing algorithm which I use to transform images that have Bayer pattern into images that represent red, green and blue channels. I wish that the algorithm would have the following properties:

It preserves as much raw information as possible.
It does not obscure details in the image, even if that means absence of denoising.
It produces as little artifacts as possible.
If the size of mosaic image is N x N, the three color images should each have size N/2 x N/2.
Algorithm should be fast. To put "fast" into a context, let me say this: I will settle for something that is at least twice as fast as OpenCV's algorithm which uses bilinear interpolation.

What I have achieved so far.

So far, I've come up with the algorithm that uses bilinear interpolation and produces three images that have half size of the mosaic image. The algorithm is approximately 3-4 times faster than OpenCV's cvtColor algorithm that performs CV_BayerBG2BGR conversion (bilinear interpolation).

See the sketch of the Bayer pattern below to get an idea about how it works. I perform the interpolation at points marked by circles. The numbers represent the coefficients by which I multiply the underlying pixels in order to get interpolated value in the point marked by black circle.

You can observe the results of my algorithm below. I've also added the results of both demosaicing algorithms that are available in OpenCV (bilinear interpolation and variable number of gradients). Please note that while results of my algorithm look really poor in comparison, the OpenCV's bilinear interpolation results look almost exactly the same if I downsample them. This is of course expected as the underlying algorithm is the same.

... so finally: the question.

My current solution gives acceptable results for my project and it is also acceptably fast. However, I would be willing to use a up to twice slower algorithm if that would bring improvements to any of the 5 criteria listed above. The question then is: how to improve my algorithm without significantly hindering the performance?

I have enough programming experience for this task so I am not specifically asking for code snippets - the answers of any kind (code, links, suggestions - especially the ones based on past experiences) are welcome.

Some additional information:

I am working in C++.
The algorithm is highly optimized, it uses SSE instructions and it is non-parallel.
I work with large images (few MB in size); cache-awareness and avoiding multiple passes through image are very important.

I am not looking for general programming advice (such as optimization in general, etc.), but on the other hand some task-specific answers are more than welcome. Thank you in advance.

Very interesting question. For interpolation you need to correct the formulas so that some of the pixels have a 0.5 position offset. — Mark Ransom, Mar 30 '17 at 22:30
I am not sure I understand what you mean by that. If I interpolate exactly at the points denoted by circles, the formulas are correct. But yeah, that means that the origins of three produced images are equally offseted (by half pixel) in comparison to the original image. But that is not really a problem in my case. — Nejc, Mar 31 '17 at 10:22
Can you provide a link to a .png of your mosaic image? The one embedded in the question has impure colors. — Mark Ransom, Apr 03 '17 at 05:30
If you are halving the image size, why bother with interpolation? Why not average the green and just use the red and blue values directly? — NetMage, May 19 '17 at 22:19

score 4 · Answer 1 · answered Mar 02 '18 at 04:43

High quality results are obtained by filtering the samples to their Nyquist frequency. Usually that's half the sample rate, but in this case since your red and blue samples only come at 1/2 the pixel rate then Nyquist will be 1/4 the sample rate. When you resize you need to filter to the lower of the Nyquist rate of the input and output, but since your output is 1/2 your input you again need to filter to 1/4.

The perfect filter is the Sinc function; it delivers 100% of the signal below the cutoff frequency and none above the cutoff. Unfortunately it's completely impractical, extending as it does to infinity. For practical applications a windowed Sinc is used instead, the most common of these being the Lanczos kernel. The window size is chosen on the basis of quality vs. speed, with higher orders being closer to the ideal Sinc function. In your case since speed is paramount I will suggest Lanczos2.

The cutoff frequency of the filter is inversely proportional to its width. Since in this case we want the cutoff to be half of the normal cutoff, the filter will be stretched to twice its normal width. Ordinarily a Lanczos2 filter will require inputs up to but not including ⁺/-2 pixels from the center point; stretching it by 2 requires inputs up to ⁺/-4 from the center.

The choice of a center point is completely arbitrary once you have a good cutoff filter. In your case you chose a point that was midway between 4 sample points. If we choose instead a point that is exactly on one of our input samples, some interesting things happen. Many of the filter coefficients become zero, which means those pixels don't have to be included in the calculations. In the example below I've centered on the Red pixel, and we find that red pixels need no filtering at all! Following is a diagram with Lanczos2 filter values scaled so that they total to 1.0 for each color, followed by the formulas that result.

red = p[x][y]
green = (p[x][y-3] + p[x-3][y] + p[x+3][y] + p[x][y+3]) * -0.03125 +
        (p[x][y-1] + p[x-1][y] + p[x+1][y] + p[x][y+1]) * 0.28125
blue = (p[x-3][y-3] + p[x+3][y-3] + p[x-3][y+3] + p[x+3][y+3]) * 0.00391 +
       (p[x-1][y-3] + p[x+1][y-3] + p[x-3][y-1] + p[x+3][y-1] + p[x-3][y+1] + p[x+3][y+1] + p[x-1][y+3] + p[x+1][y+3]) * -0.03516 +
       (p[x-1][y-1] + p[x+1][y-1] + p[x-1][y+1] + p[x+1][y+1]) * 0.31641

If you'd prefer to keep everything in the integer domain this works very well with fixed point numbers too.

red = p[x][y]
green = ((p[x][y-3] + p[x-3][y] + p[x+3][y] + p[x][y+3]) * -32 +
         (p[x][y-1] + p[x-1][y] + p[x+1][y] + p[x][y+1]) * 288) >> 10
blue = ((p[x-3][y-3] + p[x+3][y-3] + p[x-3][y+3] + p[x+3][y+3]) * 4 +
        (p[x-1][y-3] + p[x+1][y-3] + p[x-3][y-1] + p[x+3][y-1] + p[x-3][y+1] + p[x+3][y+1] + p[x-1][y+3] + p[x+1][y+3]) * -36 +
        (p[x-1][y-1] + p[x+1][y-1] + p[x-1][y+1] + p[x+1][y+1]) * 324) >> 10

The green and blue pixel values may end up outside of the range 0 to max, so you'll need to clamp them when the calculation is complete.

score 1 · Answer 2 · answered Jul 05 '17 at 17:19

I'm a bit puzzled by your algorithm and won't comment on it... but to put some things into perspective...

OpenCV is a library which contains a lot of generic stuff to get the job done, and sometimes is deliberately not performance-over-optimized, there is a cost-maintainability tradeoff and “good enough is better than better”. There is a trove of people selling performance-optimized libraries implementing some of OpenCV's features, sometimes with the exact same API. I have not used it, but OpenCV has a cv::gpu::cvtColor() which could be achieving your goals, out of the box, assuming it's implemented for demosaicing, and that you have a suitable GPU.

Considering the bilinear demosaicing, a less-maintainable but more-optimized CPU implementation can run much faster than the one from OpenCV, I'd estimate above 250 Mpx/s on one mainstream CPU core.

Now to elaborate on the optimization path...

First, because demosaicing is a local operation, cache awareness is really not a significant problem.

An performance-optimized implementation will have different code paths depending on the image dimensions, Bayer pattern type, instruction sets supported by the CPU (and their speed/latency), for such a simple algorithm it's going to become a lot of code.

There are SIMD instructions to perform shuffling, arithmetic including averaging, streaming memory writes, which you'd find useful. Intel's summary is not so bad to navigate, and Agner Fog's site is also valuable for any kind of implementation optimization. AVX & AVX2 provide several interesting instructions for pixel processing.

If you are more the 80/20 kind of person (good for you!), you'll appreciate working with a tool like Halide which can generate optimized stencil code like a breeze (modulo the learning curve, which is already setting you back a few days from a 1-hour naive implementation or the 10 minutes using OpenCV) and especially handles the boundary conditions (image borders).

You can get a little further (or take an alternative road) by using compiler intrinsics to access specific CPU instructions, at this point your code is now 4x costlier (in terms of development cost), and will probably get you 99% as far as hand-crafted assembly ($$$ x4 again).

If you want to squeeze the last drop (not generally advisable), you will definitely have to perform days of implementation benchmarks, to see which sequence of instructions can get you the best performance.

But also, GPUs... you can use your integrated GPU to perform demosaicing, it may be a little faster than the CPU and has access to the main memory... of course you'd have to care about pre-allocating shared buffers. A discrete GPU would have a more significant transfer overhead, at these fill rates.

score 0 · Answer 3 · answered Feb 28 '19 at 09:02

I know I'm late to the discussion but I want to contribute my ideas as this is a problem that I have been thinking about myself. I see how you determined your coefficients from a linear interpolation of the 4 closest red or blue pixels. This would give you the best possible result if the intensity per color channel would vary linearly.

De-bayering artifacts, however, are most significant at color edges. In your case you would interpolate over the color edge, which would give you worse results than simply picking the closest red or blue pixel.

This is why I average the green pixels and take the closest red and blue pixel for my combined De-bayering and down-sampling. I believe that this should work better at color edges, but it would work less well for image areas with gradually varying color.

I haven't yet had an idea yet of what would be the optimal way to do this.

score 0 · Answer 4 · answered Apr 27 '20 at 03:47

I actually implemented exactly what you're talking about here, in Halide.

You should read the paper by Morgan McGuire that I used as a reference... it's less about how much the neighbors factor into the output identity pixel and more about which pixels you're looking at to do a straight average.

Demosaicing algorithm that contains downsampling

Introduction: What I am working on.

What I have achieved so far.

... so finally: the question.

4 Answers4