Image convolution in spatial domain

Question

I am trying to replicate the outcome of this link using linear convolution in spatial-domain.

Images are first converted to 2d double arrays and then convolved. Image and kernel are of the same size. The image is padded before convolution and cropped accordingly after the convolution.

As compared to the FFT-based convolution, the output is weird and incorrect.

How can I solve the issue?

Note that I obtained the following image output from Matlab which matches my C# FFT output:

.

^{Update-1: Following @Ben Voigt's comment, I changed the Rescale() function to replace 255.0 with 1 and thus the output is improved substantially. But, still, the output doesn't match the FFT output (which is the correct one).}

.

^{Update-2: Following @Cris Luengo's comment, I have padded the image by stitching and then performed spatial convolution. The outcome has been as follows:}

^{So, the output is worse than the previous one. But, this has a similarity with the 2nd output of the linked answer which means a circular convolution is not the solution.}

.

^{Update-3: I have used the Sum() function proposed by @Cris Luengo's answer. The result is a more improved version of **Update-1**:}

^{But, it is still not 100% similar to the FFT version.}

.

^{Update-4: Following @Cris Luengo's comment, I have subtracted the two outcomes to see the difference:}
,

^{1. spatial minus frequency domain

2. frequency minus spatial domain}

_{Looks like, the difference is substantial which means, spatial convolution is not being done correctly.}

.

Source Code:

^{(Notify me if you need more source code to see.)}

    public static double[,] LinearConvolutionSpatial(double[,] image, double[,] mask)
    {
        int maskWidth = mask.GetLength(0);
        int maskHeight = mask.GetLength(1);

        double[,] paddedImage = ImagePadder.Pad(image, maskWidth);

        double[,] conv = Convolution.ConvolutionSpatial(paddedImage, mask);

        int cropSize = (maskWidth/2);

        double[,] cropped = ImageCropper.Crop(conv, cropSize);

        return conv;
    } 
    static double[,] ConvolutionSpatial(double[,] paddedImage1, double[,] mask1)
    {
        int imageWidth = paddedImage1.GetLength(0);
        int imageHeight = paddedImage1.GetLength(1);

        int maskWidth = mask1.GetLength(0);
        int maskHeight = mask1.GetLength(1);

        int convWidth = imageWidth - ((maskWidth / 2) * 2);
        int convHeight = imageHeight - ((maskHeight / 2) * 2);

        double[,] convolve = new double[convWidth, convHeight];

        for (int y = 0; y < convHeight; y++)
        {
            for (int x = 0; x < convWidth; x++)
            {
                int startX = x;
                int startY = y;

                convolve[x, y] = Sum(paddedImage1, mask1, startX, startY);
            }
        }

        Rescale(convolve);

        return convolve;
    } 

    static double Sum(double[,] paddedImage1, double[,] mask1, int startX, int startY)
    {
        double sum = 0;

        int maskWidth = mask1.GetLength(0);
        int maskHeight = mask1.GetLength(1);

        for (int y = startY; y < (startY + maskHeight); y++)
        {
            for (int x = startX; x < (startX + maskWidth); x++)
            {
                double img = paddedImage1[x, y];
                double msk = mask1[x - startX, y - startY];
                sum = sum + (img * msk);
            }
        }

        return sum;
    }

    static void Rescale(double[,] convolve)
    {
        int imageWidth = convolve.GetLength(0);
        int imageHeight = convolve.GetLength(1);

        double maxAmp = 0.0;

        for (int j = 0; j < imageHeight; j++)
        {
            for (int i = 0; i < imageWidth; i++)
            {
                maxAmp = Math.Max(maxAmp, convolve[i, j]);
            }
        }

        double scale = 1.0 / maxAmp;

        for (int j = 0; j < imageHeight; j++)
        {
            for (int i = 0; i < imageWidth; i++)
            {
                double d = convolve[i, j] * scale;
                convolve[i, j] = d;
            }
        }
    } 

    public static Bitmap ConvolveInFrequencyDomain(Bitmap image1, Bitmap kernel1)
    {
        Bitmap outcome = null;

        Bitmap image = (Bitmap)image1.Clone();
        Bitmap kernel = (Bitmap)kernel1.Clone();

        //linear convolution: sum. 
        //circular convolution: max
        uint paddedWidth = Tools.ToNextPow2((uint)(image.Width + kernel.Width));
        uint paddedHeight = Tools.ToNextPow2((uint)(image.Height + kernel.Height));

        Bitmap paddedImage = ImagePadder.Pad(image, (int)paddedWidth, (int)paddedHeight);
        Bitmap paddedKernel = ImagePadder.Pad(kernel, (int)paddedWidth, (int)paddedHeight);

        Complex[,] cpxImage = ImageDataConverter.ToComplex(paddedImage);
        Complex[,] cpxKernel = ImageDataConverter.ToComplex(paddedKernel);

        // call the complex function
        Complex[,] convolve = Convolve(cpxImage, cpxKernel);

        outcome = ImageDataConverter.ToBitmap(convolve);

        outcome = ImageCropper.Crop(outcome, (kernel.Width/2)+1);

        return outcome;
    }

If you have a GPU you could try to run the calculations there. Image processing is often well suited for parallellism. Maybe something like this can help: https://devblogs.nvidia.com/hybridizer-csharp/ — Hans Kilian, Jul 10 '18 at 10:49
For image processing, "time domain" implies a movie. If you mean "the domain on a still image which is not frequency", then that would be "spatial". — Ben Voigt, Jul 11 '18 at 03:28
I don't know anything about C#, so maybe this is a stupid comment: what does the `string str = string.Empty;` do inside the double loop? You don't use `str` for anything, this could be slowing down things? -- Either way, convolution in the spatial domain with a large kernel is very expensive by definition. This is why the FFT method is so important. — Cris Luengo, Jul 11 '18 at 03:49
This looks like one of the problems addressed in the answer you linked to -- integer wraparound. Does your image plotting library expect values in the range [0, 255] or [0.0, 1.0]? — Ben Voigt, Jul 11 '18 at 03:53
@CrisLuengo, `string.Empty` is used for occasional debugging purposes. Removed it. — user366312, Jul 11 '18 at 04:06
@CrisLuengo, the loop will never go out of bounds as the image is always padded beforehand. Pad size is image-dimension + mask dimension. — user366312, Jul 11 '18 at 04:07
@BenVoigt, Yes. That is why I am using `Rescale()`. But, it's not working properly. The `Rescale()` is taken from the liked question and then modified to work with double. — user366312, Jul 11 '18 at 04:07
@CrisLuengo, I know spatial-domain convolution is expensive. This is just done to develop my know-how. I am removing the performance part of the question. — user366312, Jul 11 '18 at 04:13
Re: "Rescale is modified to work with double". It doesn't look like it. You are still using the `255 / maxAmp` scaling factor. Images in floating-point format require a different scale than those in integer format. — Ben Voigt, Jul 11 '18 at 04:16
@CrisLuengo, `convWidth` is the size of the convolved/output image. Padding operation is not shown here. Padding is done before and outside the `ConvolveTd()` function. — user366312, Jul 11 '18 at 04:26
Another question: does `ImagePadder` add `maskWidth` zero pixels all around the image? In principle `maskWidth-1` should be enough. And you should add `maskHeight-1` to top and bottom, though in this case these are the same sizes. You can then set `convWidth = imageWidth - (maskWidth-1)` (i.e. the size of the input image). — Cris Luengo, Jul 14 '18 at 05:36
@CrisLuengo, @CrisLuengo, `does ImagePadder add maskWidth zero pixels all around the image?` --- Yes. What it does is, it first creates a blank image with black background and then places the original image at its center. — user366312, Jul 14 '18 at 07:08

Cris Luengo · Answer 1 · 2018-07-11T22:25:27.107

5

Your current output looks more like the auto-correlation function than the convolution of Lena with herself. I think the issue might be in your Sum function.

If you look at the definition of the convolution sum, you'll see that the kernel (or the image, doesn't matter) is mirrored:

sum_m( f[n-m] g[m] )

For the one function, m appears with a plus sign, and for the other it appears with a minus sign.

You'll need to modify your Sum function to read the mask1 image in the right order:

static double Sum(double[,] paddedImage1, double[,] mask1, int startX, int startY)
{
    double sum = 0;

    int maskWidth = mask1.GetLength(0);
    int maskHeight = mask1.GetLength(1);

    for (int y = startY; y < (startY + maskHeight); y++)
    {
        for (int x = startX; x < (startX + maskWidth); x++)
        {
            double img = paddedImage1[x, y];
            double msk = mask1[maskWidth - x + startX - 1, maskHeight - y + startY - 1];
            sum = sum + (img * msk);
        }
    }

    return sum;
}

The other option is to pass a mirrored version of mask1 to this function.

edited Jul 11 '18 at 22:25

answered Jul 11 '18 at 17:12

Cris Luengo

55,762
10
62
120

Thank you for your answer. I think, modifying `Sum()` is not a good idea. Could you kindly tell me what is meant by a mirror here? Is it a horizontal flip? – user366312 Jul 11 '18 at 22:10
Besides, `Sum()` is giving an `index out of bounds` exception. – user366312 Jul 11 '18 at 22:17
@anonymous You need to mirror (flip) each dimension. For a 2D image, this is equivalent to rotating over 180 degrees. – Cris Luengo Jul 11 '18 at 22:25
I guess the `index out of bounds` is due to an off-by-one error. That's what I get for posting code I cannot even compile. :) – Cris Luengo Jul 11 '18 at 22:26
So, it is a vertical flip. Okay, I am giving a try. – user366312 Jul 11 '18 at 23:13
1

@anonymous: No, it is a vertical *and * a horizonatal flip. You need to flip each of the image dimensions. – Cris Luengo Jul 12 '18 at 00:18
Okay, I tried your `Sum()` function. The output is more improved. Do you have any more trick up your sleeves to make it 100% compliant? Or, do I need to do something with the FFT output? – user366312 Jul 12 '18 at 01:35
@anonymous: Do you display these in the same way? Maybe there is a different gamma function applied for display? I tried replicating this experiment in MATLAB, and get the same results for spatial and frequency domain convolution, within 1/1e12, which is totally reasonable numerical accuracy for double-precision floats. But my output doesn't look like either of yours, it looks somewhere in between the two. This is why I suspect that the difference you see is a display issue rather than anything else. Did you try displaying the difference? – Cris Luengo Jul 12 '18 at 04:28
I have done the subtraction of outcomes and updated my question. See Update-4. – user366312 Jul 13 '18 at 23:38
@anonymous: Interesting, there are some strange artefacts to the bottom and right of that first image. Is that uninitialized data? Maybe not part of the result? The section that looks like the subtraction indicates that there is a bias in the result -- the top half is too low in intensity, the bottom half too high. I presume you're using the same padding for the FFT and spatial results? Is it same padding size too? – Cris Luengo Jul 14 '18 at 05:30
`I presume you're using the same padding for the FFT and spatial results? Is it same padding size too?` --- mm... apparently not. There is a minute difference. I have added the code. You can check it. – user366312 Jul 14 '18 at 07:33
are you out, or just gave me some time and hints to ponder upon? – user366312 Jul 16 '18 at 06:27
@anonymous: I haven’t had much time on SO during the weekend. I saw your updates, I didn’t see anything wrong. The only thing I can think of right now is to make the padding equal for both — or at least make sure the crop at the end yields results of the same size, so the outputs can be compared better. The result of the subtraction is odd, but I think it is because of different image sizes. – Cris Luengo Jul 16 '18 at 14:17

score 3 · Accepted Answer · answered Jul 24 '18 at 19:13

I have found the solution from this link. The main clue was to introduce an offset and a factor.

factor is the sum of all values in the kernel.
offset is an arbitrary value to fix the output further.

.

@Cris Luengo's answer also raised a valid point.

.

The following source code is supplied in the given link:

    private void SafeImageConvolution(Bitmap image, ConvMatrix fmat) 
    { 
        //Avoid division by 0 
        if (fmat.Factor == 0) 
            return; 

        Bitmap srcImage = (Bitmap)image.Clone(); 

        int x, y, filterx, filtery; 
        int s = fmat.Size / 2; 
        int r, g, b; 
        Color tempPix; 

        for (y = s; y < srcImage.Height - s; y++) 
        { 
            for (x = s; x < srcImage.Width - s; x++) 
            { 
                r = g = b = 0; 

                // Convolution 
                for (filtery = 0; filtery < fmat.Size; filtery++) 
                { 
                    for (filterx = 0; filterx < fmat.Size; filterx++) 
                    { 
                        tempPix = srcImage.GetPixel(x + filterx - s, y + filtery - s); 

                        r += fmat.Matrix[filtery, filterx] * tempPix.R; 
                        g += fmat.Matrix[filtery, filterx] * tempPix.G; 
                        b += fmat.Matrix[filtery, filterx] * tempPix.B; 
                    } 
                } 

                r = Math.Min(Math.Max((r / fmat.Factor) + fmat.Offset, 0), 255); 
                g = Math.Min(Math.Max((g / fmat.Factor) + fmat.Offset, 0), 255); 
                b = Math.Min(Math.Max((b / fmat.Factor) + fmat.Offset, 0), 255); 

                image.SetPixel(x, y, Color.FromArgb(r, g, b)); 
            } 
        } 
    }

Image convolution in spatial domain

2 Answers2

Linked