I want to find the human face's yaw, pitch, and roll angles in a given image. Based on my understanding, I need to follow the steps below:
- Using mediapipe to find the face landmarks
- Opencv-Python solvePnP function to produce rotation vector
- Pass rotation vector to OpenCV-python Rodrigues function to get rotation matrix
- Finally, decompose the rotation matrix to get a pitch, yaw, and roll angles.
But I do not understand how to pass the first two arguments to the SolvePnP function, which are the 3d object points and 2d image points. How can I identify which points they are, using mediapipe face landmarks? I found Face pose estimation (calculate Euler angles) blog, which uses Nose tip, Chin, Left eye left corner, Right eye right corner, Left Mouth corner, and Right mouth corner. But they are put manually and will differ for each image, and I couldn't find mediapipe landmark labels for those points. If I find the points corresponding to the 6 landmark points used in the blog manually, should x and y coordinates be used as 2d image points, and x, y, and z coordinates for 3d object points?