0

I am trying to learn kernel convolution for image processing. Now, I understand the concept of kernel convolution, but I am a bit confused about code that I have found for it at https://www.pyimagesearch.com/2016/07/25/convolutions-with-opencv-and-python/

Specifically, I am confused about the bounds in the for loops and the location of the convolution output.

def convolve(image, kernel):
    # grab the spatial dimensions of the image, along with
    # the spatial dimensions of the kernel
        (iH, iW) = image.shape[:2]
        (kH, kW) = kernel.shape[:2]

    # allocate memory for the output image, taking care to
    # "pad" the borders of the input image so the spatial
    # size (i.e., width and height) are not reduced
    pad = (kW - 1) // 2
    image = cv2.copyMakeBorder(image, pad, pad, pad, pad,
        cv2.BORDER_REPLICATE)
    output = np.zeros((iH, iW), dtype="float32")


    # loop over the input image, "sliding" the kernel across
    # each (x, y)-coordinate from left-to-right and top to
    # bottom
#QUESTION 1 SECTION BEGIN
    for y in np.arange(pad, iH + pad):
        for x in np.arange(pad, iW + pad):
            # extract the ROI of the image by extracting the
            # *center* region of the current (x, y)-coordinates
            # dimensions
            roi = image[y - pad:y + pad + 1, x - pad:x + pad + 1]

#QUESTION 1 SECTION END

    # perform the actual convolution by taking the
    # element-wise multiplication between the ROI and
    # the kernel, then summing the matrix
    k = (roi * kernel).sum()

#QUESTION 2 SECTION BEGIN

    # store the convolved value in the output (x,y)-
    # coordinate of the output image
    output[y - pad, x - pad] = k

#QUESTION 2 SECTION END

Question 1: Why is np.arange from pad to iH+pad, and not from pad to iH-pad ? I assume that we start from pad so that the center pixel in the region of interest is never on the edge of the image. However, I would think that going to iH+pad would overshoot and have the center pixel end up outside of image dimensions.

Question 2: This code has us store the output pixel at a location to the left and up from where I centered my convolution roi, no ? If so, could someone explain the logic behind doing this for me?

Thank you!

1 Answers1

0

np.arange(pad, iH + pad) runs over iH pixels, which is the width of the original input image. The padded image has a width of iH + 2*pad, so this is running from pad pixels from the beginning to pad pixels from the end of an image column, such that one can index up to pad pixels in both directions without exiting the padded image.

Regarding your second question: the input image was padded, the indexing is into the padded image. image[pad,pad] obtains the top-left pixel of the original image before padding, and corresponds to output[0,0]. output is not padded.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • Hi Cris. Thank you for getting back to me. The code actually does work without error. – ExactPlace441 Aug 24 '21 at 21:43
  • And the answer to my second question seems to make sense. My target image is padded, so I need to access "unpadded" pixels in my output. To do this, I "neglect the padding" in my output by subtracting it from the indeces. Correct? – ExactPlace441 Aug 24 '21 at 21:46
  • @ExactPlace441 Indeed, the output image does not have the padding, and so indexing needs to subtract the padding size. I have amended the answer for the first question. I evaluated that too quickly on the first read, and arrived at the wrong conclusion. – Cris Luengo Aug 25 '21 at 00:27