Improve performance of maximum filter in 2D array by shape

Question

Let say I have a 2D array

let img = [[0, 1, 2, 1, 3, 0],
           [1, 1, 1, 1, 1, 1],
           [1, 2, 1, 0, 1, 1],
           [1, 1, 1, 1, 1, 1],
           [0, 1, 4, 1, 5, 0],
           ]

let shape = [[0,1,0],
             [1,1,1],
             [0,1,0]]

let diamon_shape = [[0, 0, 1, 0, 0],
                    [0, 1, 1, 1, 0],
                    [1, 1, 1, 1, 1],
                    [0, 1, 1, 1, 0],
                    [0, 0, 1, 0, 0]]

I place center of the shape(diamond) onto each column then each row to get max number inside the shape (=1) then replace center of shape by the max number. This likes dilation and erosion in image morphology

Here is my implementation in Swift:

class func maximum_filter(image:[[Int]], shape:[[Int]]) -> [[Int]]{
        let wShape = shape[0].count
        let hShape = shape.count
        let wImage = image[0].count
        let hImage = image.count
        var final = Array(repeating: Array(repeating: 0.0, count: image[0].count), count: image.count)
        for i in 0..<hImage {
            for ii in 0..<wImage {
                var startOfWZ = 0
                var startOfHZ = 0
                var wStart = ii - wShape/2
                if wStart < 0 {
                    wStart = 0
                    startOfWZ = 1
                }
                var wEnd = ii + wShape/2
                if wEnd >= wImage {
                    wEnd = wImage - 1
                }
                var hStart = i - hShape/2
                if hStart < 0 {
                    hStart = 0
                    startOfHZ = 1
                }
                var hEnd = i + hShape/2
                if hEnd >= hImage {
                    hEnd = hImage - 1
                }

                var hz = startOfHZ
                var maxNumber = 0.0
                for x in hStart...hEnd {
                    var wz = startOfWZ
                    for xx in wStart...wEnd {
                        if shape[hz][wz] == 1 {
                            let currentNumber = image[x][xx]
                            if currentNumber > maxNumber {
                                maxNumber = currentNumber
                            }
                        }
                        wz += 1
                    }
                    hz += 1
                }
                final[i][ii] = maxNumber
            }
        }

        return final
    }

First 2 loops I iterate each element of matrix to place center of the shape onto. Then next 2 loops I get all elements of image map with elements (=1) of shape then compare them to get maximum number. Nothing complicated. The result is :

1 2 2 3 3 3
1 2 2 1 3 1 
2 2 2 1 1 1 
1 2 4 1 5 1
1 4 4 5 5 5

But when I try with real image 4096x4096(The input in Double not in Int in sample) and the diamond shape is 41x41. The performance is super slow (10 seconds) compared with python(1 second). Here the code i use in python result = maximum_filter(img, footprint=shape). I couldn't see the source code of maximum_filter to follow, so I implement it by my self. I got same result but the performance is much slower than their's.

so using the same image and similar code (approach) python does it in 1 second and in swift it takes 10 seconds? — Scriptable, May 10 '17 at 09:48
I couldn't see the code in python. I implemented this code by my self — hoangpx, May 10 '17 at 14:54
If you did not see the Python code, how do you know that it does the same computation in 1 second? — Martin R, May 10 '17 at 17:09
I mean the source code, the code they're implemented. Here https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.ndimage.filters.maximum_filter.html — hoangpx, May 10 '17 at 17:11
Can you add your Python program which does the same computation? — Martin R, May 10 '17 at 17:13

score 0 · Answer 1 · answered May 10 '17 at 18:11

You can't expect to achieve the same performance as a specialized function from a framework using your the fist algorithm that comes to mind. The code behind that Python function is probably optimized using advanced memory management techniques and different logic.

Just as an example of the kind of difference this can mean, here's a naive algorithm for the same function that performs 20x faster than yours on a 4096 x 4096 image with a 41x41 crosshair shape :

func naiveFilter(image:[[Int]],  shape:[[Int]]) -> [[Int]]
{

   let imageRows      = image.count
   let imageCols      = image.first!.count
   let shapeRows      = shape.count
   let shapeCols      = shape.first!.count
   let shapeCenterRow = shapeRows / 2  
   let shapeCenterCol = shapeCols / 2 

   let outerRowIndex  = imageRows - shapeRows + shapeCenterRow
   let outerColIndex  = imageCols - shapeCols + shapeCenterCol

   let shapeOffsets  = shape.enumerated().flatMap{ row,rowFlags in rowFlags.enumerated().filter{$1 == 1}
                                                                           .map{(row - shapeCenterRow, $0.0 - shapeCenterCol) } }

   var maxValues  = image 

   var imageRow   = 0
   var imageCol   = 0
   var imageValue = 0
   var maxValue   = 0

   for row in (0..<imageRows)
   {
        let innerRow = row >= shapeCenterRow && row < outerRowIndex

        for col in (0..<imageCols)
        {
           maxValue = 0

           if innerRow && col >= shapeCenterCol && col < outerColIndex           
           {
              for (rowOffset,colOffset) in shapeOffsets
              {
                 imageValue = image[row+rowOffset][col+colOffset]
                 if imageValue > maxValue { maxValue = imageValue }
              }
           }
           else
           {
              for (rowOffset,colOffset) in shapeOffsets
              {
                 imageRow = row + rowOffset
                 imageCol = col + colOffset

                 guard imageRow < imageRows else { break }

                 if imageRow >= 0
                 && imageCol >= 0
                 && imageCol < imageCols
                 {
                    imageValue = image[row+rowOffset][col+colOffset]
                    if imageValue > maxValue { maxValue = imageValue }
                 } 
              }
           }
           if maxValue > 0 { maxValues[row][col] = maxValue }
       }
   }

   return maxValues  
}

I didn't even go into flat memory models, range check and retain cycle optimization, register shifting, assembly code, or any of the 100s of techniques that could have been used in the code you didn't see.

Your code takes 30 seconds to complete. Mine take 18 seconds. Note that the real function take input of image in [[Double]] . I change that to [[Int]] to support my sample. — hoangpx, May 10 '17 at 18:26
I'm not surprised with that different result given that my naive algorithm's performance is dependent on the content of the shape (and of the image) as it will perform fewer operations when there are fewer 1s in the shape. That was not the point I was tying to make though. Given that a simple change in algorithm can make a such big difference, we won't be comparing oranges to oranges unless we at least have the same processing logic. Only then can we start looking at the underpinnings of programming languages. — Alain T., May 10 '17 at 18:38

Improve performance of maximum filter in 2D array by shape

1 Answers1