In general, you need to "calibrate" the camera. That means to estimate lens distortion coefficients, optical center, focal lengths, perhaps even shear coefficients. All of that depends on the camera sensor and the lens in front of it (including focus and zoom). That is usually done with calibration patterns.
In place of a proper calibration, you can assume some defaults.
im = cv.imread("L91xP.jpg")
(height, width) = im.shape[:2]
assert (width, height) == (1280, 720), "or whatever else"
K = np.eye(3)
K[0,0] = K[1,1] = 1000 # 500-5000 is common
K[0:2, 2] = (width-1)/2, (height-1)/2
# array([[1000. , 0. , 639.5],
# [ 0. , 1000. , 359.5],
# [ 0. , 0. , 1. ]])
Center is in the center, focal length is some moderate value. A wrong focal length here will only affect the distortion coefficients, and as a factor at that.
Distortion coefficients can be assumed to be all 0. You can tweak them and watch what happens.
dc = np.float32([-0.54, 0.28, 0. , 0. , 0. ]) # k1, k2, p1, p2, k3
Undistortion... can be applied to entire pictures or to individual points.
Points:
- either
cv.undistortImagePoints(impts, K, dc)
(newish API because undistortPoints did more than some people need)
- or
cv.perspectiveTransform(cv.undistortPoints(impts, K, dc), K)
(perspectiveTransform undoes some of the work of undistortPoints)
Images:
im_undistorted = cv.undistort(im, K, dc)
Now you have image and points without lens distortion.
modelpts = np.float32([
[45., 0.],
[90., 0.],
[90., 60.],
[45., 60.]]) * 15 # 15 pixels per foot
impts = [
[511.54881, 184.64497],
[758.16124, 141.19525],
[1159.37185, 191.21864],
[1153.4168, 276.2696]
]
impts_undist = np.float32([
[ 508.38733, 180.3246 ],
[ 762.08234, 133.98148],
[1271.5339 , 154.91203],
[1250.6611 , 260.52057]]).reshape((-1, 1, 2))
Perspective transform requires at least four pairs of points. In each pair, one point is defined in the one perspective (side of the field), and the other point is defined in the other perspective (top-down/"model").
H = cv.getPerspectiveTransform(impts_undist, modelpts)
You can chain some more transformations to that homography (H), like translation/scaling in either image space, to move the picture where you want it. That's just matrix multiplication.
# add some in the X and Y dimension
Tscale = np.array([
[ 1., 0., 75.], # arbitrary values
[ 0., 1., 25.],
[ 0., 0., 1.]])
And then you apply the homography to the undistorted input image:
topdown = cv.warpPerspective(im_undistorted, H, dsize=(90*15, 60*15))
Those are the building blocks. You can then build something interactive using createTrackbar
to mess with the distortion coefficients until the output looks straight-ish.
Don't expect that to become perfect. Besides distortion coefficients, the optical center might not really be where it's supposed to. And the picked points on the side view may be off by a pixel or so, but that translates into several feet at such a shallow angle and distance across the field.
It's really best to get a calibration pattern and wave it (well... hold very still!) in front of the camera. I'd recommend "ChArUco" boards. They're the easiest to yield usable results because with those you don't need to keep the entire board in view.
Here are some pictures:
input as you've given it...
undistorted:

top-down view:
(to get some more of the surroundings, multiply some translation in front of the homography like H2 = T @ H
to move that to the bottom right a little, and give warpPerspective a larger dsize
)