Calculating camera motion out of corresponding 3d point sets

Question

I am having a little problem. I wrote a program that extracts a set of three-dimensional points in each frame using a camera and depth information. The points are in the camera coordinate system, which means the origin is at the camera center, x is horizontal distance, y vertical distance and z the distance from the camera (along the optical axis). Everything is in meters. I.e. point (2,-1,5) would be two meters right, one meter below and five meters along the optical axis of the camera.

I calculate these points in each time frame and also know the correspondences, like I know which point in t-1 belongs to which 3d point in t.

My goal now is to calculate the motion of the camera in each time frame in my world coordinate system (with z pointing up representing the height). I would like to calculate relative motion but also the absolute one starting from some start position to visualize the trajectory of the camera.

This is an example data set of one frame with the current (left) and the previous 3D location (right) of the points in camera coordinates:

-0.174004 0.242901 3.672510 | -0.089167 0.246231 3.646694 
-0.265066 -0.079420 3.668801 | -0.182261 -0.075341 3.634996 
0.092708 0.459499 3.673029 | 0.179553 0.459284 3.636645 
0.593070 0.056592 3.542869 | 0.675082 0.051625 3.509424 
0.676054 0.517077 3.585216 | 0.763378 0.511976 3.555986 
0.555625 -0.350790 3.496224 | 0.633524 -0.354710 3.465260 
1.189281 0.953641 3.556284 | 1.274754 0.938846 3.504309 
0.489797 -0.933973 3.435228 | 0.561585 -0.935864 3.404614

Since I would like to work with OpenCV if possible I found the estimateAffine3D() function in OpenCV 2.3, which takes two 3D point input vectors and calculates the affine transformation between them using RANSAC.

As output I get a 3x4 transformation matrix.

I already tried to make the calculation more accurate by setting the RANSAC parameters but a lot of times the trnasformation matrix shows a translatory movement that is quite big. As you can see in the sample data the movement is usually quite small.

So I wanted to ask if anybody has another idea on what I could try? Does OpenCV offer other solutions for this?

Also if I have the relative motion of the camera in each timeframe, how would I convert it to world coordinates? Also how would I then get the absolute position starting from a point (0,0,0) so I have the camera position (and direction) for each time frame?

Would be great if anybody could give me some advice!

Thank you!

UPDATE 1:

After @Michael Kupchick nice answer I tried to check how well the estimateAffine3D() function in OpenCV works. So I created two little test sets of 6 point-pairs that just have a translation, not a rotation and had a look at the resulting transformation matrix:

Test set 1:

1.5 2.1 6.7 | 0.5 1.1 5.7
6.7 4.5 12.4 | 5.7 3.5 11.4
3.5 3.2 1.2 | 2.5 2.2 0.2
-10.2 5.5 5.5 | -11.2 4.5 4.5
-7.2 -2.2 6.5 | -8.2 -3.2 5.5
-2.2 -7.3 19.2 | -3.2 -8.3 18.2

Transformation Matrix:

1           -1.0573e-16  -6.4096e-17  1
-1.3633e-16 1            2.59504e-16  1
3.20342e-09 1.14395e-09  1            1

Test set 2:

1.5 2.1 0 | 0.5 1.1 0
6.7 4.5 0 | 5.7 3.5 0
3.5 3.2 0 | 2.5 2.2 0
-10.2 5.5 0 | -11.2 4.5 0
-7.2 -2.2 0 | -8.2 -3.2 0
-2.2 -7.3 0 | -3.2 -8.3 0

Transformation Matrix:

1             4.4442e-17  0   1
-2.69695e-17  1           0   1
0             0           0   0

--> This gives me two transformation matrices that look right at first sight...

Assuming this is right, how would I recalculate the trajectory of this when I have this transformation matrix in each timestep?

Anybody any tips or ideas why it's that bad?

Isn't it strange that the first row of your resulting transformation matrices is all zero? I mean the diagonal of rotation matrix should be all 1 or some close value, are you sure that you are reading the matrix correctly? — Michael Kupchick, Mar 01 '12 at 11:23
Ohh right! I read the transformation matrix right but I had an error in one of the input vectors. So I changed the transformation matrices to how they look now which makes much more sense with the rotation and translation. Isn't that the kind of format I am looking for? Wouldn't the ICP algorithm give me about the same for this dataset? — Valentino Cazalet, Mar 01 '12 at 12:23
last matrix is still looks strange, look at the 1 from second row third column, at the first matrix the result seems ok — Michael Kupchick, Mar 01 '12 at 13:48
The one from 3rd column second row was wrong, sorry... has to be a zero... -> changed it... — Valentino Cazalet, Mar 01 '12 at 14:09
By the way camera motion is rigid body motion - rotation and translation, affine transformation is more than this, may be with noisy data it shouldn't work well, since there is other affine transformation (not only rotation and translation) that is optimal so you get wrong results on real data — Michael Kupchick, Mar 01 '12 at 15:01
Ok thank you... I checked again and I don't see an error why it's not a 1 in (3,3)... might it be because they're all coplanar? I'm going to try it using the links you posted... maybe I'll find a better way for this... ;) — Valentino Cazalet, Mar 01 '12 at 18:36

score 3 · Accepted Answer · answered Feb 28 '12 at 23:59

This problem is much more 3d related than image processing.

What you are trying to do is to register the knowing 3d and since for all the frames there is same 3d points->camera relation the transformations calculated from registration will be the camera motion transformations.

In order to solve this you can use PCL. It is opencv's sister project for 3d related tasks. http://www.pointclouds.org/documentation/tutorials/template_alignment.php#template-alignment This is a good tutorial on point cloud alignments.

Basically it goes like this:

For each pair of sequential frames 3d point correspondences are known, so you can use the SVD method implemented in

http://docs.pointclouds.org/trunk/classpcl_1_1registration_1_1_transformation_estimation_s_v_d.html

You should have at least 3 corresponding points.

You can follow the tutorial or implement your own ransac algorithm. This will give you only some rough estimation of the transformation (can be quite good if the noise is not too big) in order to get the accurate transfomation you should apply ICP algorithm using the guess transformation calculated at the previous step. ICP is described here:

http://www.pointclouds.org/documentation/tutorials/iterative_closest_point.php#iterative-closest-point

These two steps should give you an accurate estimation of the transformation between frames.

So you should do pairwise registration incrementally - registering first pair of frames get the transformation from first frame to the second 1->2. Register the second with third (2->3) and then append the 1->2 transformation to the 2->3 and so on. This way you will get the transformations in the global coordinate system where the first frame is the origin.

Thank you for that answer! I will check how it's done with the PCL library. But as far as I see it, where is the difference between ICP algorithm and the one I used in OpenCV? I mean sure both work differently, but don't both of them return the relative rotation and translation between the frames? The tutorial gives me a 4x4 matrix T. I've tried it just the same with the data I get from the openCV function. I also created a 4x4 matrix P, which is the orientation and position. I tried to calculate the trajectory by: P_t = P_t-1 * T_t Unfortunately this gives me impossible results. Any ideas? — Valentino Cazalet, Feb 29 '12 at 13:10
Try to check the estimateAffine3D function. Generate a set of 3D points then generate the second one with known translation (no rotation). Try to register those sets and see if the results are reasonable. — Michael Kupchick, Feb 29 '12 at 13:30
Thanks! I added an update to my post above that shows two simple test cases for the estimateAffine3D()... it seems to work for one test case but not for the other... so I guess I gotta try to use the PCL library instead... Or do you have any other ideas? — Valentino Cazalet, Mar 01 '12 at 10:49

Calculating camera motion out of corresponding 3d point sets

1 Answers1