I am trying to estimate the head pose of single images mostly following this guide: https://towardsdatascience.com/real-time-head-pose-estimation-in-python-e52db1bc606a
The detection of the face works fine - if i plot the image and the detected landmarks they line up nicely.
I am estimating the camera matrix from the image, and assume no lens distortion:
size = image.shape
focal_length = size[1]
center = (size[1]/2, size[0]/2)
camera_matrix = np.array([[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]], dtype="double")
dist_coeffs = np.zeros((4, 1)) # Assuming no lens distortion
I am trying to get the head pose by matching points in the image with points in the 3D model using solvePNP:
# 3D-model points to which the points extracted from an image are matched:
model_points = np.array([
(0.0, 0.0, 0.0), # Nose tip
(0.0, -330.0, -65.0), # Chin
(-225.0, 170.0, -135.0), # Left eye corner
(225.0, 170.0, -135.0), # Right eye corner
(-150.0, -150.0, -125.0), # Left Mouth corner
(150.0, -150.0, -125.0) # Right mouth corner
])
image_points = np.array([
shape[30], # Nose tip
shape[8], # Chin
shape[36], # Left eye left corner
shape[45], # Right eye right corne
shape[48], # Left Mouth corner
shape[54] # Right mouth corner
], dtype="double")
success, rotation_vec, translation_vec) = \
cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs)
finally, I am getting the euler angles from the rotation:
rotation_mat, _ = cv2.Rodrigues(rotation_vec)
pose_mat = cv2.hconcat((rotation_mat, translation_vec))
_, _, _, _, _, _, angles = cv2.decomposeProjectionMatrix(pose_mat)
now the azimuth is what i would expect - it is negative if i look to the left, zero in the middle and positive to the right.
the elevation however is strange - if i look in the middle it has a constant value but the sign is random - changing from image to image (the value is around 170).
When i look up the sign is positive and the value decreases the more i look up, When i look down the sign is negative and the value decreases the more i look down.
Can someone explain this output to me?