Real Time pose estimation of a textured object

Question

I'm working on an AR application where the marker is a 3d object with a complex shape, so I'm trying to make a cad-based recognition system as a first step.

as far as I found the main steps for building a 3d model from a set of images is: 1-to loop through the images and extract their features. 2-perform a pairwise matching 3-compute the 3d points, and their corresponding descriptors and camera parameters for each image.

now my first question is how should I determine the descriptor for each 3d point, as we know the 3d point is extracted from a set of similar 2d features, meaning that there are many similar descriptors each of which corresponds to a 2d point, so which of those descriptors should we choose? they are not exactly the same instead slightly differ from each others.

my other question is: based on this tutorial Real Time pose estimation of a textured object provided by OpenCV, it's required that the model is in .yaml format and the mesh in .ply format. I need to know how to store my 3d structure into these types of files? is there any steps or tools that can help doing so?

thanks in advance

Catree · Accepted Answer · 2017-06-24T21:00:06.453

1

Quick answer, these are what you should need:

have a CAD model of the textured object
"learn" the keypoints:
- for each "training" images, detect and retain the keypoints detected on the image
- for each 2D keypoint, compute the correspond 3D object coordinate using the 3D CAD mesh and extract the corresponding descriptor
- save in a file the list of 3D object coordinates and the corresponding list of descriptors
to detect the object:
- detect the keypoints in the desired images
- match the current keypoints with those saved
- estimate the object pose using a robust approach (RANSAC) with solvePnPRansac(): the 3D object points are the 3D object coordinates saved in the training step, the 2D images points are the 2D image coordinates of the keypoints currently detected and matched

The tutorial should more or less do something similar.

The "tricky" part should be to calculate the coordinate of the object 3D point for a given 2D image point and the camera pose:

you can see here how it is done in the OpenCV tutorial Real Time pose estimation of a textured object

What I would do (can be a little bit different than the tutorial code), for a 2D image point (e.g. one keypoint location):

transform the 2D image point to the normalized camera frame (z=1) using the intrinsic matrix, see undistortPoints()
test if the current 2D image point belongs to the object or not: intersection between the image ray and the triangle mesh at the current camera pose (you will need to test it for every triangle)
if the current 2D image point belongs to the object, the corresponding 3D object point (in the object frame) can be nearest point of the 3 points that form the triangle or you can compute the intersection point between the image ray and the triangle

edited Jun 24 '17 at 21:00

answered Jun 23 '17 at 16:44

Catree

2,477
1
17
24

thanks for your answer! I really appreciate it if you clarify this step "for each 2D keypoint, compute the correspond 3D object coordinate using the 3D CAD mesh and extract the corresponding descriptor" for me and explain it in details – Lisa.s Jun 24 '17 at 02:16
@Lisa.s I have completed my answer. Unfortunately, this topic needs some maths and I don't have the time to detail more my answer. If you want more information, see homogeneous transformation to know how to transform a 3D point in one frame to another frame (it is just a matrix multiplication at the end), perspective camera model and some geometry for intersection between a line and a triangle. – Catree Jun 24 '17 at 21:11
I am sorry for bothering you @Catree I'm still stuck in an idea and hope you can help me "save in a file the list of 3D object coordinates and the corresponding list of descriptors" should the same 3d coordinate be stored many times, each with the corresponding descriptor of a key point in one of the training image? or is there some way to select only one descriptor for a 3d coordinate? – Lisa.s Jul 07 '17 at 01:57
@Lisa.s In my opinion, storing multiples times the same 3D object coordinates with the corresponding descriptor should not be an issue. When you will perform the features matching, you will try to match the keypoints in the current images with those detected in the training images. This should reduce the probability to have the same 3D objects coordinates for the pose estimation method. Moreover, the RANSAC pose estimation should consider these points as outliers. – Catree Jul 07 '17 at 16:48
first I want to say thanks and sorry for bothering you @Catree, all things are clear for me now except for one point, in order to guess the 2d- 3d correspondence (at the learning step) I need to know the camera pose! in the tutorial they have manually insert the 8 points which determine the cube structure and rely on them to obtain the pose, but in the case of more complicated objects like a doll for example or a 3d environment, how can I determine the camera pose for and image at the learning step in order to perform the re-projection & obtain the descriptors that corresponds to a 3d point? – Lisa.s Jul 26 '17 at 16:08
One possibility I see: as you should have the CAD model of the object, use specific points (at least 4 points) as "landmarks" and compute the current pose (with `solvePnP`). The 3D object points are the landmarks you have chosen and the 2D image points are the landmarks projected in the image. Just select them manually. You will have to do this for all the training images. To get the model of an object, you can use a 3D reconstruction software coupled with a depth sensor (e.g. an RGB-D camera like the Kinect or the Intel RealSense). – Catree Jul 26 '17 at 16:45
you are a gift from the god! I think every thing is clear now, I can't wait to test it! – Lisa.s Jul 26 '17 at 23:34
@Catree could you please tell how we could prepare such a cad model? – Feb 15 '20 at 22:20

Real Time pose estimation of a textured object

1 Answers1