1

I need help in figuring out the algorithm/implementation OpenCV is using for image-downsampling with non-linear scaling factors.

I know the question was already ask a few times, but most answers seem to not match OpenCV's implementation (for instance, this answer is not correct when using OpenCV: https://math.stackexchange.com/questions/48903/2d-array-downsampling-and-upsampling-using-bilinear-interpolation).

Minimal problem formulation:

I want to downsample an image of resolution 4x4 to an image of resolution 3x3 using bilinear interpolation. I am interested in the interpolation coefficients.

Example in python:

img = np.asarray([[ 1,  2,  3,  4],
                  [ 5,  6,  7,  8],
                  [ 9, 10, 11, 12],
                  [13, 14, 15, 16]]).astype(np.float32)

img_resized = cv2.resize(img, (3, 3), 0, 0, cv2.INTER_LINEAR).astype(np.float32)

print(img)
# [[ 1.  2.  3.  4.]
#  [ 5.  6.  7.  8.]
#  [ 9. 10. 11. 12.]
#  [13. 14. 15. 16.]]

print(img_resized)
# [[ 1.8333333  3.1666667  4.5      ]
#  [ 7.166667   8.5        9.833333 ]
#  [12.5       13.833333  15.166666 ]] 

Interpolation coefficients:

After a lot of trial-and-error, I figured out the interpolation coefficients OpenCV is using for this specific case.

For the corner points of the 3x3 image:

 1.8333333 = 25/36 *  1 + 5/36 *  2 + 5/36 *  5 + 1/36 *  6
 4.5000000 = 25/36 *  4 + 5/36 *  3 + 5/36 *  8 + 1/36 *  7
12.5000000 = 25/36 * 13 + 5/36 *  9 + 5/36 * 14 + 1/36 * 10
15.1666666 = 25/36 * 16 + 5/36 * 15 + 5/36 * 12 + 1/36 * 11

For the middle points of the 3x3 image:

8.5 = 1/4 * 6 + 1/4 * 7 + 1/4 * 10 + 1/4 * 11

For the remaining 4 points of the 3x3 image:

 3.1666667 = 5/12 *  2 + 5/12 *  3 + 1/12 *  6 + 1/12 *  7
 7.1666667 = 5/12 *  5 + 5/12 *  9 + 1/12 *  6 + 1/12 * 10
 9.8333333 = 5/12 *  8 + 5/12 * 12 + 1/12 *  7 + 1/12 * 11
13.833333  = 5/12 * 14 + 5/12 * 15 + 1/12 * 10 + 1/12 * 11

Question:

Can someone please help me make sense of these interpolation coefficients? How are they calculated? I tried to read the source of the cv::resize() function, but it did not help me a lot :S

Lodrik
  • 629
  • 9
  • 19
  • If those weights you figured out are actually correct, then it's just doing linear interpolation straight on, without low-pass filtering. If so, the question is just "how does it determine where to put the output points?" – Cris Luengo Apr 25 '19 at 21:52
  • @CrisLuengo: Yes that is correct. As far as I know, OpenCV does not apply a low-pass filter (e.g. blurring) before downsampling. Thus, the question is equivalent to finding the deterministic sample point locations (--> see answer below). – Lodrik Apr 26 '19 at 09:26

1 Answers1

2

After playing around with various test cases, I think I know the answer to how OpenCV chooses the sample point locations. As @ChrisLuengo has pointed out in a comment, OpenCV seems to not apply a low-pass filter before downsampling, but uses (bi-)linear interpolation only.

(Possible) Solution:

Let's assume we have a 5x5 image, which pixel positions are represented with the blue circles in the graphic below. We now want to downsample it to a 3x3, or a 4x4 image, and need to find the sample positions of the new downsampled image in the original image grid.

It appears to be that OpenCV uses pixel distance of 1 for the original image grid, and a pixel distance of (OLD_SIZE / NEW_SIZE), thus here 5/3 and 5/4, for the new image grid. Moreover, it aligns both grids at the center point. Thus, OpenCV's deterministic sampling algorithms can be visualized as follows:

Visualization 5x5 to 3x3:

Sampling points when downsampling a 5x5 image to a 3x3 image using OpenCv resize with bilinear interpolation

Visualization 5x5 to 4x4:

Sampling points when downsampling a 5x5 image to a 4x4 image using OpenCv resize with bilinear interpolation

Sample Code (Python 2.7):

import numpy as np
import cv2


# 1. H_W is the height & width of the original image, using uniform H/W for this example
#    resized_H_W is the height & width of the resized image, using uniform H/W for this example

H_W = 5
resized_H_W = 4



# 2. Create original image & Get OpenCV resized image:

img = np.zeros((H_W, H_W)).astype(np.float32)

counter = 1

for i in range(0, H_W):
    for j in range(0, H_W):
        img[i, j] = counter
        counter += 1

img_resized_opencv = cv2.resize(img, (resized_H_W, resized_H_W), 0, 0, cv2.INTER_LINEAR).astype(np.float32)



# 3. Get own resized image:

img_resized_own = np.zeros((resized_H_W, resized_H_W)).astype(np.float32)

for i in range(0, resized_H_W):
    for j in range(0, resized_H_W):
        sample_x = (1.0 * H_W) / 2.0 - 0.50 + (i - (1.0 * resized_H_W - 1.0) / 2.0) * (1.0 * H_W) / (1.0 * resized_H_W)
        sample_y = (1.0 * H_W) / 2.0 - 0.50 + (j - (1.0 * resized_H_W - 1.0) / 2.0) * (1.0 * H_W) / (1.0 * resized_H_W)

        pixel_top_left  = img[int(np.floor(sample_x)), int(np.floor(sample_y))]
        pixel_top_right = img[int(np.floor(sample_x)), int(np.ceil(sample_y))]
        pixel_bot_left  = img[int(np.ceil(sample_x)),  int(np.floor(sample_y))]
        pixel_bot_right = img[int(np.ceil(sample_x)),  int(np.ceil(sample_y))]

        img_resized_own[i, j] = (1.0 - (sample_x - np.floor(sample_x))) * (1.0 - (sample_y - np.floor(sample_y))) * pixel_top_left  + \
                                (1.0 - (sample_x - np.floor(sample_x))) * (sample_y - np.floor(sample_y))         * pixel_top_right + \
                                (sample_x - np.floor(sample_x))         * (1.0 - (sample_y - np.floor(sample_y))) * pixel_bot_left  + \
                                (sample_x - np.floor(sample_x))         * (sample_y - np.floor(sample_y))         * pixel_bot_right



# 4. Print results:

print "\n"
print "Org. image: \n", img
print "\n"
print "Resized image (OpenCV): \n", img_resized_opencv
print "\n"
print "Resized image (own): \n", img_resized_own
print "\n"
print "MSE between OpenCV <-> Own: ", np.mean(np.square(img_resized_opencv - img_resized_own))
print "\n"

Disclaimer:

This is just my theory that I tested via ~10 test cases. I do not claim that this is 100% true.

Lodrik
  • 629
  • 9
  • 19
  • Interesting approach. I think that if you think of pixels as little squares rather than sample points, that this approach makes sense. I always think of them as sample points, like you seem to do, then this approach looks funny. :) – Cris Luengo Apr 26 '19 at 13:05