0

Im trying to find the pose of the camera using solvePnP() for each frame in a video taken by a moving camera. It works for most of the frames, but for some frames the solutions returned by solvePnP() are nowhere near the solutions of the previous frame(s). Providing an extrinsic guess wont help because it wont add better solutions, it just chooses the generic solution that is closest to the guess.

enter image description here

Keep in mind that these are generic solutions, so there are 2 sets of solutions: one set above ground and one below. Obviously the camera did not teleport. The below code reproduces the graph above. Notice that there are no big gaps in the 2d image points as the output 3d positions would suggest.

import numpy as np
import cv2
import matplotlib.pyplot as plt

# Create Chart
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
legend_list = np.array([])

def plot_path(v, legend_string="point"):
    global legend_list
    legend_list = np.append(legend_list, legend_string)
    x = v[:,0]
    y = v[:,1]
    z = v[:,2]
    ax.scatter(x, y, z)

def rtvec_to_matrix(rvec=(0,0,0), tvec=(0,0,0)):
    rvec = np.asarray(rvec)
    tvec = np.asarray(tvec)
    T = np.eye(4)
    (R, jac) = cv2.Rodrigues(rvec)
    T[:3, :3] = R
    T[:3, 3] = tvec.squeeze()
    return T

proj_matrx = np.array([[1635.3757, 0.0, 964.7231 ], [0.0, 1633.0149, 567.26917], [0.0, 0.0, 1.0]])
distortion = np.array([0.20745164, -1.0343311, 0.00411783, 0.0013199, 1.5680445])

obj_points = np.array([[-14.36121, 37.83202, 10.06615], [18.16821, 33.45271, 15.32258], [19.00584, 33.36314, 1.37371], [-13.0867, 21.2781, 0.88137]])

img_points = np.array([[[441.0343037,  242.94053653],[1393.43216275,   53.03746055],[1374.65125528,  426.5919237 ],[577.20988054, 699.05925325]],
                [[436.7139912,  237.08897403],[1389.29935025,   49.56089805],[1369.94813028,  423.5372362 ],[570.15519304, 692.91862825]],
                [[431.2218037,  232.04209903],[1385.04935025,   46.85777305],[1365.24500528,  421.2481737 ],[562.33488054, 687.24675325]],
                [[425.7921162,  227.89366153],[1380.23685025,   44.87339805],[1360.01063028,  419.7481737 ],[553.48331804, 683.11394075]],
                [[419.9327412,  224.18272403],[1375.22903775,   43.54527305],[1354.29188028,  418.7872362 ],[545.32706804, 679.41862825]]])

solutions = np.empty((0,3))
frame_4_solutions = np.empty((0,3))

# for each frame
for i, _ in enumerate(img_points):

    _, rvecg, tvecg, reproj_err = cv2.solvePnPGeneric(obj_points, img_points[i], proj_matrx, distortion, flags=cv2.SOLVEPNP_AP3P)

    # calculate all generic solutions and add to array
    for k,_ in enumerate(tvecg):
        position = np.linalg.inv(rtvec_to_matrix(rvecg[k], tvecg[k])).dot(np.array([0,0,0,1]))[:3]
        if i == 4:
            frame_4_solutions = np.concatenate((frame_4_solutions, [position]))
            frame_4_reproj_err = reproj_err
        else:
            solutions = np.concatenate((solutions, [position]))


# plot points
plot_path(solutions, "Solutions Frames 0-3")
plot_path(frame_4_solutions, "Solutions Frame 4")
plot_path(obj_points, "obj points")

print("Frame 4 Solutions:")
print(frame_4_solutions)
print("Frame 4 Reprojection Error:")
print(frame_4_reproj_err)

plt.xlabel("X")
plt.ylabel("Y")
plt.legend(legend_list)
ax.set_xlim3d(-25.0, 35.0)
ax.set_ylim3d(-24.0, 36.0)
ax.set_zlim3d(-25.0, 35.0)
plt.show()

Output


Frame 4 Solutions:
[[-14.93125033 -19.09029517  22.1930716 ]
 [-14.93125033 -19.09029517  22.1930716 ]
 [-14.93125008 -19.09029526  22.19307174]
 [ 20.63741944  -1.86221876 -24.49901755]]
Frame 4 Reprojection Error:
[[  6.25958972]
 [  6.25958972]
 [  6.25959019]
 [295.06886244]]

3 of the solutions for the problem frame are right on top of each other and the 4th is below the floor. Noisy is the only way I can describe the results, which seems like the job for a kalman filter like this post suggests. But the 2d points were tracked manually by me and I guaruntee their accuracy. And the object points look accurate as well. Why would a kalman filter help?

What am I doing wrong? Is this an issue with noise? Does anyone else have this issue?

0 Answers0