How to improve a trajectory of a camera, built from rotation and translation?

Question

I am trying to recover a trajectory of a 2D camera, using a sequence of 2D-images and OpenCV. But the trajectory I get is not so good as I would like it to be. It goes back and forth instead of going just forth.

I have a sequence of photos taken on 2D-camera while it was moving (KITTI dataset, outdoors part, namely). For each two sequential frames I compute the rotation matrix (R) and translation vector (t) with E = cv2.findEssentialMat() and cv2.recoverPose(E, ...), and then I estimate the trajectory, assuming that coordinates of every translation vector are given in local coordinate system, which position is set by the corresponding rotation matrix.

upd: Each recovered position looks like [X,Y,Z], and I scatter (X_i, Y_i) for every i (these points are thought to be 2D positions), so the following graphs are my estimated trajectories.

Here's what I get instead of a straight line (the camera was moving straight forward). Previous results were even worse.

The green point is where it starts and the red point is where it ends. So most of the time it even moves backwards. This, though, is probably because of a mistake in the beginning, which was the cause of everything turning around (right?)

Here's what I do:

E, mask = cv2.findEssentialMat(points1, points2, K_00, cv2.RANSAC, 0.99999, 0.1)

inliers, R, t, mask = cv2.recoverPose(E, points1, points2, K_00, R, t, mask)

Seems to me that recoverPose somehow chooses wrong R and t sign on some steps. So the trajectory that was supposed to go forward, goes back. And then forth again.

What I did to improve the situation was:

1) skip the frames with too many outliers (I check this both after using findEssentialMat and after using recoverPose)

2) set the threshold for RANSAC method in findEssentialMat to 0.1

3) increase the number of the feature points on each image from 8 to 24.

This didn't really help.

Here I need to note: I know that on practice, 5-point algorithm, which is used for computing the essential matrix, needs a lot more points than 8 or even 24. And maybe this is actually the problem.

So the questions are:

1) Can the number of feature points (approx. 8-24) be the cause of recoverPose mistakes?

2) If checking the number of outliers if the right thing, then what percentage of outliers should I set as the limitation?

3) I estimate positions like this (instead of simple p[i+1] = R*p[i]+t):

C = np.dot(R, C)
p[i+1] = p[i] + np.dot(np.linalg.inv(C), t)

This is because I can't help thinking of t as a vector in local coordinates, so C is the transformation matrix, which is updated on every step to summarize the rotations. Is that right or not really?

4) It's really possible that I am missing something, since my knowledge of the topic seems tiny. Is there anything (anything!) you could recommend?

Huge thanks for your time! I would appreciate any advice.

upd: for example, here are the first six rotation matrices, translation vectors, and recovered positions I get. Signs of t seem a bit crazy.

upd: here's my code. (I'm not a really good programmer yet). The main idea is that my feature points are corners of bouding boxes of static objects, which I detect with Faster R-CNN (I used this implementation). So the first part of the code detects objects, and the second part uses detected feature points for recovering the trajectory.

Here's the dataset I use (this is part 2011_09_26_drive_0005 from here).

@PiotrSiekański It's X and Y axis which I take from poses I recover. Each recovered position looks like [X,Y,Z], and I scatter (X_i, Y_i) for every i, so I suppose this graph to be a 2D trajectory. — Olya Agapova, Jul 31 '19 at 08:48
OK, and what is the Z coordinate? If your camera is moving forward, the Z coordinate is increased while X and Y remain almost the same. What is the range of Z measurements compared to X and Y? — Piotr Siekański, Jul 31 '19 at 09:03
@PiotrSiekański really? This must be something I'm missing. I thought that the Z coordinate must stay unchanged, since the camera is moving forward, not up. But in fact it really changes a lot. I'll add some of the estimated positions, translation vectors and rotation matrices to my question to illustrate the problem. — Olya Agapova, Jul 31 '19 at 09:09
Your results look weird, does your camera shake? Please post also your code and the link to the dataset you use to replicate your results. — Piotr Siekański, Jul 31 '19 at 09:22
@PiotrSiekański I don't think it shakes so much. I posted the code and the dataset. I've read [here](https://www.researchgate.net/publication/304035052_Relative_Camera_Pose_Recovery_and_Scene_Reconstruction_with_the_Essential_Matrix_in_a_Nutshell) that the choice of R and t is a matter of voting, so it really depends on how many feature points I use. The results are really weird, but my only hypothesis is that I should use a lot more feature points. Or maybe the main idea (described in "upd" in my question) is initially wrong. :( — Olya Agapova, Jul 31 '19 at 09:43
It is better to extract feature points using FAST, SIFT or other feature detectors. It will extract up to several thousands of feature points and then match them and compute essential matrix using RANSAC approach. See this link for details: https://avisingh599.github.io/vision/monocular-vo/ python implementation can be found here: https://github.com/uoip/monoVO-python — Piotr Siekański, Jul 31 '19 at 13:31
@PiotrSiekański exactly what I was looking for when thinking of more feature points! thank you, I'll try that — Olya Agapova, Aug 01 '19 at 06:44

How to improve a trajectory of a camera, built from rotation and translation?

0 Answers0