Performance of the code for Convolution in spatial domain

Question

See: Image convolution in spatial domain

The following code achieves Linear convolution in Spatial domain.

    public static double[,] ConvolutionSpatial(double[,] paddedImage, double[,] mask, double offset)
    {
        double min = 0.0;
        double max = 1.0;

        double factor = GetFactor(mask);

        int paddedImageWidth = paddedImage.GetLength(0);
        int paddedImageHeight = paddedImage.GetLength(1);

        int maskWidth = mask.GetLength(0);
        int maskHeight = mask.GetLength(1);

        int imageWidth = paddedImageWidth - maskWidth;
        int imageHeight = paddedImageHeight - maskHeight;

        double[,] convolve = new double[imageWidth, imageHeight];

        for (int y = 0; y < imageHeight; y++)
        {
            for (int x = 0; x < imageWidth; x++)
            {
                double sum = Sum(paddedImage, mask, x, y);

                convolve[x, y] = Math.Min(Math.Max((sum / factor) + offset, min), max);

                string str = string.Empty;
            }
        }

        return convolve;
    }

    public static double Sum(double[,] paddedImage1, double[,] mask1, int startX, int startY)
    {
        double sum = 0;

        int maskWidth = mask1.GetLength(0);
        int maskHeight = mask1.GetLength(1);

        for (int y = startY; y < (startY + maskHeight); y++)
        {
            for (int x = startX; x < (startX + maskWidth); x++)
            {
                double img = paddedImage1[x, y];
                double msk = mask1[maskWidth - x + startX - 1, maskHeight - y + startY - 1];
                sum = sum + (img * msk);
            }
        }

        return sum;
    }

    public static double GetFactor(double[,] kernel)
    {
        double sum = 0.0;

        int width = kernel.GetLength(0);
        int height = kernel.GetLength(1);

        for (int y = 0; y < height; y++)
        {
            for (int x = 0; x < width; x++)
            {
                sum += kernel[x, y];
            }
        }

        return (sum == 0) ? 1 : sum;
    }

Its performance is as follows:

 image-size     kernel-size   time-elapsed  
 ------------------------------------------  
 100x100        3x3                  13ms  
 512x512        3x3                 291ms
1018x1280       3x3                1687ms  
 100x100      100x100              4983ms  
 512x512      512x512          35624394ms  
1018x1280    1018x1280      [practically unusable]

I have two questions:

Does it look like a descent performance?
If not, how can I increase performance?

You will need to use Fourier transform, might [that answer](https://stackoverflow.com/questions/38709810/image-convolution-in-frequency-domain) helps. Oh, this is your question, so, you're on the right way. — Yurii N., Aug 27 '18 at 10:14

Dmytro Mukalov · Accepted Answer · 2018-08-27T15:44:21.033

It depends on your final requirements.

The obvious things to do: replace multidimensional arrays [,] with jagged arrays [][] and reverse the order of nested loops:

for (int x = 0; ...; x++)
{
    for (int y = 0; ...; y++)
    {
       ...
    }
}

instead of

for (int y = 0; ...; y++)
{
    for (int x = 0; ...; x++)
    {
       ...
    }
}

In the first case the CPU cache lines are used much more efficiently (because the left index represents continuous rows) while the latter one invalidates a cache line on every iteration.

So benchmarks of code with jagged arrays and reversed loops has shown 2 times better performance for the 1018x1280 3x3 convolution comparing to the original implementation:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.167 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=3515624 Hz, Resolution=284.4445 ns, Timer=TSC
  [Host]    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
RyuJitX64 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0

Job=RyuJitX64  Jit=RyuJit  Platform=X64  

       Method |     Mean |     Error |    StdDev |
------------- |---------:|----------:|----------:|
 BenchmarkOld | 61.82 ms | 0.3979 ms | 0.3527 ms |
 BenchmarkNew | 26.98 ms | 0.1050 ms | 0.0982 ms |

Here is the code:

    public static double[][] ConvolutionSpatial(double[][] paddedImage, double[][] mask, double offset)
    {
        double min = 0.0;
        double max = 1.0;

        double factor = GetFactor(mask);

        int paddedImageWidth = paddedImage.Length;
        int paddedImageHeight = paddedImage[0].Length;

        int maskWidth = mask.Length;
        int maskHeight = mask[0].Length;

        int imageWidth = paddedImageWidth - maskWidth;
        int imageHeight = paddedImageHeight - maskHeight;

        double[][] convolve = new double[imageWidth][];

        for (int x = 0; x < imageWidth; x++)
        {
            convolve[x] = new double[imageHeight];
            for (int y = 0; y < imageHeight; y++)
            {
                double sum = Sum(paddedImage, mask, x, y);
                convolve[x][y] = Math.Min(Math.Max((sum / factor) + offset, min), max);
                string str = string.Empty;
            }
        }

        return convolve;
    }

    public static double Sum(double[][] paddedImage1, double[][] mask1, int startX, int startY)
    {
        double sum = 0;

        int maskWidth =  mask1.Length;

        for (int x = startX; x < (startX + maskWidth); x++)
        {
            var maskHeight = mask1[maskWidth - x + startX - 1].Length;
            for (int y = startY; y < (startY + maskHeight); y++)
            {
                double img = paddedImage1[x][y];
                double msk = mask1[maskWidth - x + startX - 1][maskHeight - y + startY - 1];
                sum = sum + (img * msk);
            }
        }

        return sum;
    }

    public static double GetFactor(double[][] kernel)
    {
        double sum = 0.0;

        int width = kernel.Length;

        for (int x = 0; x < width; x++)
        {
            var height = kernel[x].Length;
            for (int y = 0; y < height; y++)
            {
                sum += kernel[x][y];
            }
        }

        return (sum == 0) ? 1 : sum;
    }

And I think it can improved more with application of SIMD operations.

`1. It depends on your final requirements.` --- I have only one requirement: to make the code run faster. — user366312, Aug 27 '18 at 13:54
@anonymous There is definitely room for this - see the results of the modifications above. But improving performance is quite infinite subject. — Dmytro Mukalov, Aug 27 '18 at 15:33
`improving performance is quite infinite subject.` --- i agree, and i am aware of that. — user366312, Aug 27 '18 at 15:47

Performance of the code for Convolution in spatial domain

1 Answers1