How can I decide the points which I should give in cv2.getPerspectiveTransform()?

Question

I am trying to warp the following image so that I get a fronto-parallel view of the bigger wall, the one which comes on the left side of the image. However, I am not able to decide which points I should give in the function, cv2.getPerspectiveTransform() so that I get the matrix which if applied would yield the desired results.

The code I am using is:

import cv2
import numpy as np

circles = np.zeros((4,2),np.int)
counter = 0

def mousePoints(event,x,y,flags,params):
    global counter
    if event == cv2.EVENT_LBUTTONDOWN:

        circles[counter] = x,y
        counter = counter + 1
        print(circles)

img = cv2.imread("DSC_0273.JPG")

img = cv2.resize(img,(1500,1000))
q = 0
while True:

    if counter == 4:
        q = q+1

        height1,width1 = 1080,1920
        pts1 = np.float32([circles[0],circles[1],circles[2],circles[3]])
        width = np.sqrt((circles[1][0] - circles[0][0])**2 + (circles[1][1] - circles[0][1])**2)
        height =  np.sqrt((circles[2][1] - circles[0][1])**2 + (circles[2][0] - circles[0][0])**2)
        width = int(np.round(width))
        height = int(np.round(height))
        x1,y1 = circles[0]
    
        pts2 = np.float32([[x1,y1],[(x1+width),y1],[(x1+width),(y1+height)],[x1,(y1+height)]])
        matrix = cv2.getPerspectiveTransform(pts1,pts2)

        if q == 1:
            print(matrix.shape)
            print(matrix)
        imgOutput = cv2.warpPerspective(img,matrix,(width1,height1))
        cv2.imshow("Output Image ", imgOutput)


for x in range (0,4):
    cv2.circle(img,(circles[x][0],circles[x][1]),3,(0,255,0),cv2.FILLED)

cv2.imshow("Original Image ", img)
cv2.setMouseCallback("Original Image ", mousePoints)
cv2.waitKey(1)

So basically, I click 4 points and my code will find the warping matrix so that the area enclosed within those 4 points are mapped to a rectangle, such that the first point I give is mapped to the same pixel location, and the other 4 points are adjusted so that the enclosed becomes a rectangle. In order to extrapolate this, I apply the same matrix over the whole image. A set of points(the 4 points to be given through mouse clicks) I tried out is: [[349, 445], [396, 415],[388, 596], [338, 610]] The result I got is:

Please edit your code into your question and show the points you’ve tried, and the result you get. — DisappointedByUnaccountableMod, Sep 04 '20 at 19:21
typically described and matched keypoints like sift or surf or orb are given, since they are designed to be matched in different images. — Micka, Sep 04 '20 at 21:19
@barny As per what you had asked I have added my code, a set of points I used, and the output I got. — adarsh subramanian, Sep 05 '20 at 07:39
@Micka I am afraid I don't clearly understand what you meant there. Can you please elaborate a bit more? — adarsh subramanian, Sep 05 '20 at 07:39
Please fix the indenting of your code - it won’t execute at the moment. Imagine I want to try your code for myself, I need to be able to paste it into a file and run it - so I need a [mre]. — DisappointedByUnaccountableMod, Sep 05 '20 at 07:50
I can’t work out where your points are, except that the first one appears to be just in front of the photographer. Choose four points that are a perspective-distorted rectangular plane area on the flat plane of the frontage, as broadly spread as you can. The verticals of your distorted rectangle must be vertical, the “horizontals” must (when extended) converge at the vanishing point. — DisappointedByUnaccountableMod, Sep 05 '20 at 08:01
@barny I am sorry, I did not notice that when I pasted the code, the indentations were distorted. I have corrected it now. — adarsh subramanian, Sep 05 '20 at 09:47
@barny If you open the input image that I have uploaded, there are 4 green dots. Those 4 are the points I gave. In the output also, u can see those 4 points warped into a rectangle. As per what you have said, the horizontals must, when extended meet at the vanishing point. How can we accurately give such points so that verticals are vertical and horizontals converge at vanishing points. — adarsh subramanian, Sep 05 '20 at 09:49
Your code still doesn't work but at least there isn't a syntax error any more - 1) the resize fails for me using the image I downloaded from your question which is 1000x1500, and 2) it gets stuck in the `while True:` loop so of course it never gets to the code lower down to display the image and accept clicks. — DisappointedByUnaccountableMod, Sep 05 '20 at 12:44
@barny The code does work for me. I have indeed given mouse clicks and it has warped it for me. Actually, the original image is a 4000*6000 image. However, stack overflow only allows up to 2 MB images, which is why I had to resize the image to 1000*1500. I think due to this, and distortions while uploading and downloading(maybe, I am not sure), it is creating error for you. — adarsh subramanian, Sep 05 '20 at 16:50

DisappointedByUnaccountableMod · Answer 1 · 2020-09-05T10:13:24.887

I can't get your code to work which is frankly a real pain when code in question isn't a minimal reproducible example, I only persisted wbecause I was interested in the problem - I don't understand how your code is supposed to collect the four mouse clicks then process the perspective but then your indenting is garbage.

One problem with having to click points every run is that it's difficult to get comparable runs - so I collected clicks once and used those clicks to check things out.

One problem I ran into is the sequence of clicks. AFAICT the correct sequence to get a working transform is top-left, top-right, bottom-left, bottom-right. Confusingly the sequence of points you listed in your question isn't in that order. I had to swap the last two values in pts2 to correspond to the correct sequence. Maybe both swaps compensate for each other, I ran out of energy trying to second-guess your non-working code.

Your points are relatively close together - I think you'll get better results with more widely spread points. Note "better" not "perfect" - I don't think there is a "perfect" in what you're trying to do because the perspective distortion is so great and WarpPerspective can't magically reproject for different depths in the image.

It's also difficult to compare the output with the input because the locations of the clicks aren't visible on input or output.

The other thing I'm not sure about is the resizing the source image of the temple I downloaded if 1500 wide by 1000 high. Ignoring this.

Having said all that, I think what you're getting is pretty much what you should be getting. Yes it's very distorted, but warp perspective is a simple 2D operation which doesn't claim to do reprojection allowing for lens distortion.

Here's my simplified version of your code which uses four fixed points - I've overlaid them on the source image before transformation so they're visible in the output image - you could also overlay grid lines on the input image and see what these look like in the output image. Yes the output does contain four green dots, the righthand two are quite tiny due to the transformation. You can check that for example the tips of the arches are pretty much aligned - obviously the 3-d content within them looks weird, but as I said there's no magic wand.

import cv2
import numpy as np

circles = np.zeros((4,2),np.int)

circles = [(349, 473), (903, 158),(336, 713), (918, 758) ]

img = cv2.imread("temple.JPG")

for x in range (0,4):
    cv2.circle(img,(circles[x][0],circles[x][1]),3,(0,255,0),cv2.FILLED)

cv2.imshow("Original Image ", img)

cv2.waitKey(-1)
    
height1,width1 = 1080,1920

pts1 = np.float32([circles[0],circles[1],circles[2],circles[3]])
width = np.sqrt((circles[1][0] - circles[0][0])**2 + (circles[1][1] - circles[0][1])**2)
height =  np.sqrt((circles[2][1] - circles[0][1])**2 + (circles[2][0] - circles[0][0])**2)
width = int(np.round(width))
height = int(np.round(height))
x1,y1 = circles[0]

print( f"{x1=} {y1=} {width=} {height=}")

# NOTE the third and fourth values are swapped from the original code
pts2 = np.float32([[x1,y1],[(x1+width),y1],[x1,(y1+height)],[(x1+width),(y1+height)]])
matrix = cv2.getPerspectiveTransform(pts1,pts2)

print(matrix.shape)
print(matrix)

imgOutput = cv2.warpPerspective(img,matrix,(width1,height1))
cv2.imshow("Output Image ", imgOutput)

cv2.waitKey(-1)

Input image with input points overlaid:

Cropped part of output image:

I am really sorry that the indentation was messed up in my code. It missed my attention. — adarsh subramanian, Sep 05 '20 at 10:22
The order I used throughout is top-left, top-right, bottom-right then bottom-left. I think that is the cause of confusion. The code does work without any errors for me, but I was confused and unable to clearly give 4 points which would give the desired result. — adarsh subramanian, Sep 05 '20 at 16:55
Also, you have advised giving more spread out points rather that 4 close points. Why is this? The wall is a rigid body which is straight. So if I get the transformation matrix for a small within the wall, and apply it to the whole wall, I should technically get the whole wall warped correctly right? — adarsh subramanian, Sep 05 '20 at 16:57
Accuracy is better with points more spread out. The concept is to identify points on horizontal and vertical straight lines in the building. So use the lines on the building and try to imagine how they form a rectangle when viewing looking straight on perpendicular to the face you want to rectify. Keep in mind that the perspective transformation preserves all straight lines. Normally the ground plane would form a straight line. So where the building and ground intersect would be a good place. However, your image has some barrel distortion from a wide angle lens. So the ground is curved a bit. — fmw42, Sep 05 '20 at 17:14

How can I decide the points which I should give in cv2.getPerspectiveTransform()?

1 Answers1