Confusion about methods of pose estimation

Question

I'm trying to do pose estimation (actually [Edit: 3DOF] rotation is all I need) from a planar marker with 4 corners = 4 coplanar points.
Up until today I was under the impression from everything I read that you will always compute a homography (e.g. using DLT) and decompose that matrix using the various methods available (Faugeras, Zhang, the analytic method which is also described in this post here on stackexchange) and refine it using non-linear optimization, if necessary.

First minor question: if this is an analytical method (simply taking two columns from a matrix and creating an orthonormal matrix out of these resulting in the desired rotation matrix), what is there to optimize? I've tried it in Matlab and the result jitters badly so I can clearly see the result is not perfect or even sufficient, but I also don't understand why one would want to use the rather expensive and complex SVDs used by Faugeras and Zhang if this simple method yields results already.

Then there are iterative pose estimation methods like the Ortohogonal Iteration (OI) Algorithm by Lu et al. or the Robust Pose Estimation Algorithm by Schweighofer and Pinz where there's not even a mention of the word 'homography'. All they need is an initial pose estimation which is then optimized (the reference implementation in Matlab done by Schweighofer uses the OI algorithm, for example, which itself uses some method based on SVD).

My problem is: everything I read so far was '4 points? Homography, homography, homography. Decomposition? Well, tricky, in general not unique, several methods.' Now this iterative world opens up and I just cannot connect these two worlds in my head, I don't fully understand their relation. I cannot even articulate properly what my problem is, I just hope someone understands where I am.

I'd be very thankful for a hint or two.

Edit: Is it correct to say: 4 points on a plane and their image are related by a homography, i.e. 8 parameters. Finding the parameters of the marker's pose can be done by calculating and decomposing the homography matrix using Faugeras, Zhang or a direct solution, each with their drawbacks. It can also be done using iterative methods like OI or Schweighofer's algorithm, which at no point calculate the homography matrix, but just use the corresponding points and which require an initial estimation (for which the initial guess from a homography decomposition could be used).

We have to start with the basics. Do you already have these good 4 corners ? If so, there isn't much else to do, the problem is solved already. What happens if you cannot obtain these four points accurately ? Now the other approaches make sense, right ? — mmgp, Feb 10 '13 at 14:04
Hm, finding these 4 corner points is a matter of image processing, isn't it? I'm using the rectangle detection algorithm used by ARToolKitPlus, so yes, give or take a few pixels noise, I have the 4 corners. But none of these algorithms help me in finding corners, but in using these corner coordinates to find the pose. I don't see how an iterative algorithm could possibly improve the accuracy of my corner points anyway, after all the image is all I have. — Garp, Feb 10 '13 at 15:43
I just realized I had not explicitly written that I need the rotation around 3 axes, not just the z-axis (with the marker plane being at z=0), in which case of course the problem would be 'solved already' with 4 corner points. Is this what you were saying? — Garp, Feb 10 '13 at 16:36

score 2 · Answer 1 · answered Feb 10 '13 at 17:13

2

With only four points your solution will be normally very sensitive to small errors in their location, particularly when the rectangle is nearly orthogonal to the optical axis (this is because the vanishing points are not observable - they are outside the image and very far from the measurements - and the pose is given by the cross product of the vectors from the centre of the quadrangle to the vanishing points).

Is your pattern such that the corners can be confidently located with subpixel accuracy? I recommend using "checkerboard-type" patterns for the corners, which allow using a good and simple iterative refining algorithm to achieve subpixel accuracy (look up "iterative saddle points algorithm", or look up the docs in OpenCV).

answered Feb 10 '13 at 17:13

Francesco Callari

11,300
2
25
40

Ok, thank you. My pattern is a simple black rectangle on a white background. The algorithm I'm currently using does not allow sub-pixel accuracy, but I've planned to try and use something like the Harris corner detectior which Matlab offers out of the box. That was not my problem though, I was confused how all the beginner's code samples, tutorials and also scientific papers were talking about how to compute the homography when apparently it only seems to be truly needed when doing image processing like rectification. I'm beginning to feel like pose estimation from homographies is cumbersome. – Garp Feb 10 '13 at 18:55
The Harris algorithm by itself is not subpixel accurate (although am not sure what exactly the Matlab implementation does). Its solution is normally used as the starting point in an iterative subpixel refinement algorithm. With only 4 points you cannot do better than estimating the homography - every other solution will be just as instable. – Francesco Callari Feb 10 '13 at 22:11
Oh, ok, haven't really looked into that yet. I've also been thinking about using the Hough Transform which should give subpixel accuracy if I am not mistaken, but it is expensive. The reason I am not going to use a homography decomposition is that I haven't found a decomposition algorithm yet that deals with pose ambiguities like Schweighofer's does. It looks like I will have to compute the homography matrix nonetheless for reading information stored inside the marker, so obviously I'd rather let that not go to waste, so to speak. – Garp Feb 10 '13 at 22:49

score 0 · Answer 2 · answered Jun 11 '13 at 11:18

I will not provide you with a full answer, but it looks like at least one of the points that need to be clarified is this:

homography is an invertible mapping from P^2 (homogeneous 3-vectors) to itself, which always may be represented by an invertible 3x3 matrix. Having said that, note that if your 3d points are coplanar you will always be able to use homography to relate the world points to the image points.

In general, a point in 3-space is represented in homogeneous coordinates as a 4-vector. Projective transformation acting on P^3 is represented by a non-singular 4x4 matrix (15 degrees of freedom, 16 elements minus one for overall scale).

So, the bottom line is that if your model is planar, you will be able to get away with a homography (8 DOF) and an appropriate algorithm, while in general case you will need to estimate 4x4 matrix and would need a different algorithm for that.

Hope this helps,

Alex

Confusion about methods of pose estimation

2 Answers2